Abstract

For understanding and controlling spreading in complex networks, identifying the most influential nodes, which can be applied to disease control, viral marketing, air traffic control, and many other fields, is of great importance. By taking the effect of the spreading rate on information entropy into account, we proposed an improved information entropy (IIE) method. Compared to the benchmark methods in the six different empirical networks, the IIE method has been found with a better performance on Kendall’s Tau and imprecision function under the Susceptible Infected Recovered (SIR) model. Especially in the Facebook network, Kendall’s Tau can grow by 120% as compared with the original IE method. And, there is also an equally good performance in the comparative analysis of imprecise functions. The imprecise functions’ value of the IIE method is smaller than the benchmark methods in six networks.

1. Introduction

The phenomena of spreading can be seen everywhere in nature [17]. Many activities can be described as spreading in nature [813]. In recent years, many research studies focused on the spreading process due to its theoretical meaning and practical value [14, 15], including rumor controlling [1618], information diffusion [1922], air traffic controlling [2325], and viral marketing [2628]. Among them, research on identification of the influential nodes in complex networks is a hotspot. Understanding the influence of the node has revealed new insight into applications such as mining the key nodes [2934] and designing effective strategies to prevent epidemic from spreading or accelerate information diffusion.

The identification of influential nodes is of great significance in fields of epidemic and rumor control, targeted advertising, and air traffic planning [35, 36]. Recently, many researchers have put forward a variety of centrality methods to deal with the problem in a more efficient way to identifying these nodes. Degree centrality can be regarded as a typical method to deal with the former problem in consideration for the local information [37, 38]. In view of this idea, Chen et al. proposed the Local Rank method by considering the 4th order neighbors of the node [39]. By taking into account the location information of nodes in the network, through the -shell decomposition method, Kitsak et al. [40] discovered a fact that the most influential nodes are located at the heart of the network. Then, a lot of improved methods based on the -shell decomposition [4143] have been proposed to identify the influential nodes. Closeness centrality [44] and betweenness centrality [45] are two path-based methods. In consideration of the neighbors’ influence, Ren et al. [46] came up with the IRA method. Based on the IRA method, Zhong et al. [47] proposed the IIRA method by taking the propagation feature into account. Information entropy is also used as an important centrality to evaluate the influence of nodes [48, 49].

Most of the previous methods assume that the node’s influence depends on its own importance. But there is another key factor that cannot be neglected, namely, the neighbors’ importance. On the basis of this idea, Guo et al. [50] put forward a method of information entropy (IE) by considering the neighbors’ information quantity. Nevertheless, the performance of the IE method is also affected by the propagation feature. In the example network that is presented in Figure 1, the influence of nodes 1 and 6 cannot be accurately identified by the IE method. In this case, we think that the neighbors’ number and spreading rate are likely to have a positive effect on the target node. Based on this idea, we proposed an improved information entropy (IIE) method in which the target node’s information entropy may be affected by the propagation feature. Compared with the benchmark methods in six real networks, the IIE method has been found with a better performance on Kendall’s Tau and imprecision function under the Susceptible Infected Recovered (SIR) model [51, 52].

2. IIE Method

The original IE method assumes that the influence of the node should be obtained through the information entropy of its neighbors. In the IIE method, we argue that the spreading rate and the number of neighbors could adjust the initial information entropy. We can fulfill the identification of the influential nodes by using the final information entropy, namely, the IIE method. The details of the IIE method can be interpreted below.

In general, an undirected network can be described by an adjacent matrix , where represents the number of nodes and represents the number of edges. If node is connected to node , ; if not, . And, we think that the spreading rate and the number of neighbors could adjust the target node’s information entropy. Thus, the IIE value of any node can be calculated bywhere is the information quantity provided from to , represents the influence of the propagation feature, the spreading rate is , and represents the number of neighbors for node , also the expression of is

Equation (1) can be written aswhere and indicates node ’s th order neighbors. If , it indicates node ’s direct neighbors.

To describe the IIE method in more detail, we set and by taking into account the example network in Figure 2. For the black nodes (node 1), the improved information entropy (IIE) of node 1 is then calculated by .

3. Results

3.1. Data Description

There are six empirical networks used to evaluate the performance of the IIE method. The US air network [53] is an integral part of the US air traffic networks. The Polblogs network works as a network of political blogs in the United States with political relationship. The datasets are available on the web.

The e-mail network [54] refers to an electronic mail network of a university in Spain. The Soc-hamsterster network is a social network where the edges between nodes indicate the friendship or family ties. The Facebook network was derived from the Facebook online social platform, and its edges indicate the interpersonal relationship. The LastFM network [55] was derived from an FM broadcast platform for the Asian users where the edges represent that there exist friendships between nodes. The statistical attributes of the six networks are listed above in Table 1.

3.2. Measurement

For this paper, the node spreading influence is simulated with the SIR model [52]. There are three components to this system, namely, susceptible individuals (S), infected individuals (I), and recovered individuals (R). In each time step of the SIR model, the susceptible neighbor nodes of each infected node will be infected randomly with a certain probability . During this time, each infected node would recover with a certain probability and will no longer be infected. The spreading influence of a node is the range of infected nodes which refers to the number of nodes infected by the initial infected node in the whole network. The range of infected nodes was calculated from an average of experiments.

Kendall’s Tau [56] and the imprecision function can be used to evaluate the superiority of the IIE methods. The value of Kendall’s Tau is between [−1, 1], and this function can be used to evaluate whether there is a correlation between two ranking lists. The higher the value of Kendall’s Tau, the stronger the correlation between the two ranking lists. Kendall’s Tau can be expressed as

works as a sign function; if , the figure of equals to 1; if , the figure of equals to −1; and, if , the figure of equals to 0. represents the number of nodes in the lists, that is, in the network. Calculated by the centrality method, and are the order values in the ranking list for the nodes and . And, and are the order values in the ranking list for the nodes and which are generated by the real spreading influence. If , it means that there is a large correlation coefficient between the two different ranking lists.

The imprecision function evaluates the performance of the centrality method by calculating the average propagation ability of the top key nodes in the ranking list obtained by the centrality method. should be expressed aswhere is a proportion of the nodes to be selected, , represents the number of nodes, represents the average spreading influence of the top nodes in the ranking list obtained by the centrality method, and can be illustrated as the mean spreading influence of top nodes in the ranking list calculated by the SIR model. If is closer to , is smaller. It means that the spreading influence for the top nodes calculated by the centrality method is closer to the spreading influence of the top nodes with the real ability of spreading. This also indicates that the accuracy of the centrality method is higher.

3.3. Simulation Results

For this paper, we selected six real networks to test the IIE method. According to different networks, we set and in the SIR model.

At first, we test the influence of different values of on the performance of the IIE method. represents the distance between nodes. If , the direct neighbors’ information quantity will be provided to the target node. And, if , the target node’s information quantity will be provided by its 2nd order neighbors. The influence of parameter on the IIE method in six networks is shown in Figure 3, .

From Figure 3, we can figure out that the effect of on Kendall’s Tau calculated by the IIE method in different networks. Obviously, when we set the distance , Kendall’s Tau can get the maximum in the US air, Polblogs, e-mail, and LastFM networks. It demonstrates that the IIE method is more accurate than the ones generated by the other values of in the four networks. However, there are different phenomena in the Soc-hamsterster and Facebook networks. When or , the value of Kendall’s Tau is the largest, while the computation time of the IIE method increases dramatically. In addition, we know from the TDI theory [57] that individuals affect only a relatively small range of neighbors. Therefore, we set in later experiments.

To check the efficiency of the IIE method, the -shell, degree centrality, closeness centrality, betweenness centrality, and IE method are selected as benchmark methods to compare with the IIE method in six networks. We set , , and the distance . As can be seen from Figure 4, in the six networks, Kendall’s Tau obtained by the IIE method is much bigger than the ones obtained by the benchmark methods. This indicates that the IIE method is superior to the benchmark method. It can also be seen from Figure 4 that, in the US air and LastFM networks, the value of Kendall’s Tau obtained by the IIE method gradually increases along with the spreading rate . On the contrary, in the Soc-hamsterster and Facebook networks, the value of Kendall’s Tau obtained by the IIE method decreases with growth of spreading rate . However, divergent phenomena exist in the Polblogs and e-mail networks. As the spreading rate increases, the value of Kendall’s Tau , which is calculated obtained by the IIE method, will increase first and then decrease.

Figure 5 illustrates the improvement of ratio for Kendall’s Tau as making a comparison between the IIE method and the benchmark methods. We define aswhere represents Kendall’s Tau which is obtained by the IIE method. represents Kendall’s Tau calculated by the different benchmark methods. Obviously, if , which means the performance of the IIE method is much better. Figure 5 clearly shows that, when the IIE method compared with the benchmark methods, Kendall’s Tau increases considerably. That is, in the six networks, the IIE method is more accurate than the other benchmark methods on identifying the influential nodes. We can also find that, compared with the IE method, the maximum value of can grow by 80%. Similarly, Kendall’s Tau shows a significant increase when the IIE method is compared with the other benchmark methods in the US air network. This means that the IIE method is superior to the benchmark methods. The same phenomenon occurs in other different networks. In particular, in the Facebook network, compared with the IE method, the maximum value of can grow by 120% when .

As can be seen from Figure 6, the imprecision functions of each method are presented and impressive results have been achieved by the IIE method in the six networks. In small networks such as US air and e-mail, the results of the IIE method are remarkably superior to those of other benchmark methods. For instance, is much lower than the benchmark methods, which means that the outcome of spreading predicted by the IIE method is more reliable than that predicated by the benchmark method. In the large LastFM network, is much lower than . This result reveals that the IIE method performs more accurately than the original IE method in identifying the most influential nodes. It is worth noticing that when is small, the IIE method shows much better performance than the other benchmark methods. These phenomena show the rationality of the IIE method considering the propagation feature of the target node.

4. Conclusions

For controlling the spreading process, one of the basic tasks is to estimate the spreading influence and identify the influential nodes. By considering the information entropy and spreading rate of the target nodes, we proposed an improved information entropy (IIE) method. The IIE method takes the spreading rate and the number of the target node’s neighbors into account. And, those information dominate the new information entropy. According to the simulation results, the IIE method achieves a better performance than the IE method, and the IIE method () does not add any parameters or increase computational complexity. In the six networks, the IIE method performs much better than the other benchmark methods, such as -shell (), degree centrality (), closeness centrality (), betweenness centrality (), and IE method. Especially, in the Facebook network, comparing with the IE method, the maximum improved ratio goes up to 120%. And, there also exists an equally good performance in the comparative analysis of imprecise functions. In the six networks, is much lower than the benchmark methods. These results demonstrate that the IIE method is sure to identify the influential nodes more precisely than the benchmark methods. And, the key component of the IIE method can be utilized by other centralities. For example, the information entropy of the IIE method can be also obtained by the neighbors’ -shell values.

Compared to the benchmark methods of the six networks, accuracy of the IIE method can be more satisfactory on identifying the influential nodes, while it poses some inevitable challenges. One of the challenges is that the IIE method merely takes the influence of the spreading rate for the target node into consideration and neglects the impact from target node’s neighbors. The distance of the neighbors’ should be paid more attention, for its value affects the performance of the IIE method. We should find out what factors affect the value of . The temporal network has been paid more and more attention, which requires us to design an advanced information entropy method. And, it remains an interesting and open-ended problem.

Data Availability

The datasets used in the present study are available from the first author upon reasonable request ([email protected]).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. U1733203), Safety Foundation of CAAC (no. AQ20200019), and Foundation of CAFUC (no. J2020-084).