Abstract

Efficient community detection in a complex network is considered an interesting issue due to its vast applications in many prevailing areas such as biology, chemistry, linguistics, social sciences, and others. There are several algorithms available for network community detection. This study proposed the Sigmoid Fish Swarm Optimization (SiFSO) algorithm to discover efficient network communities. Our proposed algorithm uses the sigmoid function for various fish moves in a swarm, including Prey, Follow, Swarm, and Free Move, for better movement and community detection. The proposed SiFSO algorithm’s performance is tested against state-of-the-art particle swarm optimization (PSO) algorithms in Q-modularity and normalized mutual information (NMI). The results showed that the proposed SiFSO algorithm is 0.0014% better in terms of Q-modularity and 0.1187% better in terms of NMI than the other selected algorithms.

1. Introduction

Efficient community detection in a complex network is considered an interesting issue due to its vast applications in many prevailing areas such as biology, chemistry, linguistics, social sciences, and others [1, 2]. Complex systems are usually represented as complex graphs or networks that show the connections, dependencies, and interactions between the entities or components [3]. Networks consist of nodes (vertices) linked (edges) with other nodes. From a mathematical and computer science perspective, complex networks are graphic data structures representing large and complex world systems [4]. There are several examples of complex networks, which can be found in every science section. Newman categorizes complex networks into four broad categories: technological networks, social networks, information networks, and biological networks [5].

Community detection in a network can be represented as a data clustering problem. Clustering is the distribution of information in groups of similar objects related to groups and differs from other groups [6]. Cluster analysis is the arrangement of clusters based on the similarity of the pattern set, usually represented as a measuring vector or a point in multidimensional space [7]. Community is one of the significant features of the complex networks representing the group of nodes that often interact with other nodes in a group [8]. A group or cluster is a set of nodes in a graph with more internal ties to external alliances than the rest of the network [9].

According to Fortunato, the strategies for community detection can be classified [10] into two main classes, i.e., hierarchical clustering methods [11] and optimization-based methods [4]. In hierarchical clustering, the system is divided into several hierarchies representing different network parts at each level. Hierarchical clustering techniques can be further divided into two classes, i.e., divisive algorithms [12] and agglomerative algorithms [13]. In the divisive method, the graph is split into two subgraphs and the process is continued until the clear mark of the cluster is removed. The subgraphs are then aggregated based on adequate similarities to form new clusters. However, in the optimization-based problem, community detection is considered an optimization problem that aims to find as many possible solutions to precharacterized target work. Evolutionary algorithms are usually used in complex systems to classify group structures [14]. For better accuracy, they typically use genetic algorithms [15] to find the optimal solution due to their significant features such as low convergence and parallel search. However, genetic algorithms may not perform well in identifying realistic network structures without previous knowledge [16]. According to the literature, swarm intelligence-based techniques such as particle swarm optimization (PSO) can handle the optimization problem efficiently [15]. PSO is a nature-inspired algorithm that uses the bird flocking method for searching.

Artificial Fish Swarm Algorithm (AFSA) [17] is a well-known nature-inspired tracking algorithm that uses social behaviour and evolution in a fish’s social behaviour to perform various tasks. As a part of the evolution of deliberate behaviour, fish have been captured to ensure their existence and, if possible, to demonstrate responsible behaviour. The quest for food, success, and risk management is part of the social process, and collaborations will result in an engaging social process for all fishes at the meeting. This algorithm offers many attractive features such as reliability, fast convergence, internal failure control, and high accuracy due to which it can be used efficiently for community detection.

This paper presents a novel AFSA-based technique, the Sigmoid Fish Swarm Optimization (SiFSO) algorithm, to discover efficient network communities. In this algorithm, we introduced the sigmoid function for various fish moves in a swarm, including Prey, Follow, Swarm, and Free Move, for better movement and community detection. The proposed SiFSO algorithm’s performance is tested against other well-known swarm optimization algorithms, MODPSO, MPSO, and NE-PSO, in terms of two fitness functions Q-modularity and normalized mutual information (NMI). The results showed that the proposed SiFSO algorithm is better than the other selected algorithms, in terms of both NMI and Q-modularity. Moreover, the proposed SiFSO algorithm’s communities are very close to the original American football network communities with a minor difference. The results show that the Sigmoid Fish Swarm Optimization algorithm can effectively detect important communities in various networks, such as social networks, biological networks, and linguistics.

The rest of the paper is arranged as follows: in Section 2, we discuss related work. In Section 3, we present our proposed SiFSO algorithm in detail.In Section 4, we present the experimental setup and a discussion about the testing dataset and evaluation parameters, while in Sections 5 and 6, we discuss the experimental results and conclusions, respectively.

Network community detection is considered an important issue in various fields such as computer science, physics, biology, and sociology [9, 18, 19]. The methods proposed for community detection can be divided into two major categories based on optimization- and hierarchical-based techniques. In the hierarchical-based approach, the network is divided into several hierarchies representing different network parts at each level. Hierarchical clustering techniques can be further divided into two classes, i.e., divisive algorithms [12, 20, 21] and agglomerative algorithms [18, 22, 23]. In the divisive method, the network is split into two subgraphs and the process is continued until the clear mark of the cluster is removed. The subgraphs are then aggregated based on adequate similarities to form new clusters. Well-known divisive algorithms are Girvan–Newman (GN) [20, 24] and GN variant proposed by Fortunato et al. [25]. In comparison, the agglomerative technique uses a hierarchical bottom-up approach to find communities in complex networks. There are many agglomerative clustering techniques available in the literature, such as Fast Newman (FN) [22] and the technique proposed by Du et al. [18].

Optimization-based strategies use different object functions to find efficient and optimal clusters in complex networks. Newman and Girvan [20] used the Q-modularity object function for community detection. Similarly, Brandes et al. [26] reported that genetic algorithms [16, 19, 27, 28], ant colony optimization [29, 30], and extremal optimization [31] could be efficiently used to find optimal modularity value.

Usually, metaheuristic methods use iterative methods with learning strategies and hybrid approaches to discover network communities. For instance, Pizzuti proposed the community score concept in their proposed genetic network-based algorithm GA-Net [28]. Similarly, Shang et al. proposed another genetic network-based improved algorithm; however, the computational cost of this algorithm is very high [16]. To solve the high computational cost problem, Liu et al. used the ant colony optimization technique [16] for community detection. Pizzuti proposed a multiobjective method with community score and community fitness concepts [19].

In contrast, Qu proposed a hybrid approach that uses EO and PSO for community detection [32]. Currently, the swarm optimization technique is used efficiently for network community detection. The examples of swarm optimization-based methods are multiobjective discrete particle swarm optimization (MODPSO) [33], modified particle swarm optimization (MPSO) [1], and multiobjective discrete particle swarm optimization based on network embedding (NE-PSO) [2].

3. SiFSO: Sigmoid Fish Swarm Optimization Algorithm

Many optimization algorithms are available to detect the community in a complex network; one of them is particle swarm optimization (PSO). PSO uses previously stored information to take the next step in the network, which may introduce errors, while the artificial fish swarm optimization algorithm takes movement decisions based on the current position and thus offers more accuracy. In this section, we present Sigmoid Fish Swarm Optimization (SiFSO), an improved version of fish swarm optimization, for more accurate network communities’ detection. The proposed SiFSO algorithm is composed of two major steps, i.e., initialization and fish movement. In the initialization step, we set up the basic values for different parameters of the network, while in the next step, we used our proposed object function based on fish movements to search communities in a given network. The pseudocode of the proposed SiFSO algorithm is shown in Algorithm 1.

 Fish movement:
Input: Visual Range, Visual Decrease, Minimum Visual Range, Pixels Iteration Number, Step, Step Decrease, Minimum Step, Try Number, Factor, Fish coordinates
Output: Each solution corresponds to a partition of a network.
(1) Begin Algorithm
(2) Min–Max Normalization
(3) Label Propagation Initialization
(4) for iterations ⟵ 1 to iteration number do
(5)  for FishNo ⟵ 1 to total Fish do
(6)   current Fish neighbors ⟵ 0
(7)   current Fish neighbors ⟵ Fish in visual range
(8)   if neighbors = 0
(9)    next move ⟵ Sigmoid (Free Move)
(10)    Break, go to step-1
(11)   else
(12)    If density > crowed factor and better food consistency
(13)      Next move ⟵ Sigmoid (Prey Move)
(14)   else
(15)    Next Move ⟵ Random (Sigmoid (Swarm Move or Follow Move))
(16)  end
(17) end
(18) final result ⟶ apply modularity
(19)End Algorithm

The running time complexity of the proposed SiFSO algorithm is linear and estimated as , where n shows the pollution size and m indicates the number of iterations taken by the algorithm to find and refine all the clusters or communities.

3.1. SiFSO Operations

SiFSO is a nature-inspired fish swarm optimization-based algorithm that uses knowledge of the social activities of fish for optimization. In the water environment, fish can find a place that provides more food, either individually or in a group. In SiFSO, we improved the various movement patterns of the fish by introducing the sigmoid function. We will use the sigmoid function with all fish movements, including Prey, Follow, Swarm, and Free Move, to make turns smoothly instead of making sharp turns. The main aim of the SiFSO will be to find the food quality level in the vicinity, and it will gradually improve its food quality level or, in our case, cluster quality level.

3.1.1. Sigmoid Function

The sigmoid function is a nonlinear function usually used to map a vast information area into a small space between 0 and 1. This function creates an “S”-shaped or sigmoid curve. The sigmoid function is used in cases where a specific mathematical model is not available. In this research, we used sigmoid functions to calculate sharp turns in fish movement. Mathematically, the sigmoid function is represented in the following equation:where e represents the natural logarithm, A represents the curve’s maximum value, and z represents any real number between −∞ and + ∞.

3.1.2. Density

Density represents the number of fishes or nodes inside the visual range. The value of density varies between 0 and 1, where 1 shows high density and 0 shows low density. Mathematically, the density is represented in the following equation:

3.1.3. Free Move

In nature, when fish reaches the point where it cannot find any food, it moves randomly in any direction. Similarly, in the artificial fish swarm optimization algorithm, when a fish reaches the vicinity boundary, it takes any direction calculated using the sigmoid function. Mathematically, the Free Move function is represented in the following equation:where F(t) shows the time at the fish’s current position, step represents the movement increment, and sigmoid function is applied between −1 and 1 to calculate a new direction.

3.1.4. Prey Move

Usually, each fish continuously searches for food and the points where they can find extra food. This movement is called the prey movement in the artificial fish swarm optimization algorithm. For prey, fish initially checks its visual range for prey (shown in equation (4)). Then, it makes a move towards food based on food density (shown in equation (5)).where Fi shows the fish’s current position, t shows the time at the current position, step shows the movement increment, and t + 1 shows the next move. Distance is calculated as the Euclidean distance between the present and next position. The sigmoid function is applied between −1 and 1 to calculate a new direction.

3.1.5. Swarm Move

One of the properties of fish as a swarm is that they generally attempt to move with each other as a group to achieve goals. This collective movement of fishes as a swarm helps fishes to get their goal quickly without being scattered. This collective movement of fishes is called the swarm movement. In the artificial swarm movement, the fish first calculates the central position and keeps itself in the center to achieve a specific goal as swarm. The formula for the swarm center calculation is shown in equation (6). Then, the fish moves according to the swarm movement to achieve goals such as a search for food. The formula for swarm movement is shown in equation (7).where Fi shows the current position of fish, t shows the time at the current position, step shows the movement increment, t + 1 shows the next move, and Fcenter shows swarm’s central position. Distance is calculated as the Euclidean distance between the present and central position. The sigmoid function is applied between −1 and 1 to calculate a new direction.

3.1.6. Follow Move

When one or more fishes find food during swarm movement, they change their direction to get that food. In that case, some of the neighbor fishes tail them to get more food. This movement is called follow movement. For follow movement, fish keeps on checking all fishes in its visual range for better food opportunities as compared to the current state. The formula for follow movement is shown in the following equation:where Fi shows the current position of fish, t shows the time at the current position, step shows the movement increment, t + 1 shows the next move, and Fn indicates the number of neighbor fishes. Distance is calculated as the Euclidean distance between the current and central position. The sigmoid function is applied between −1 and 1 to calculate a new direction.

4. Experimental Setup

The proposed SiFSO algorithm is implemented and simulated in MATLAB version 2013 using Intel Core i3 CPU 2.67 GHz and 4 GB of RAM. We used C++ and MS Excel to perform preprocessing steps such as data normalization. The work by Bastian et al. [35] is used to visualize the communities detected by our proposed SiFSO algorithm and Q-modularity calculation. In this section, we discuss the simulation parameters, dataset, and evaluation parameters.

4.1. Simulation Parameters

As an input, SiFSO will take multiple parameters, including Visual Range, Iteration Number, Steps, Try Number, and Crowd Factor. The input values selected in this research are shown in Table 1.

The detail of the inputs is given as follows:(1)Visual Range. The visual range is similar to looking at fish-like habitats, which are the first global variables and then decrease over time, change locally, and increase in the surrounding area.(2)Iteration Number. The iteration number shows all popular fish names to create clusters based as far as possible on their general experience. This view will depend on the visual spectrum and the steps taken by the fish in each process.(3)Steps. The first fish grows long due to the limits of its global vision, and then its growth slows down with respect to the chain of vision and seldom gets to the point where the growth of fish divides them into the areas of concern.(4)Try Number. The number of size shall be equal to the prey behaviour of sharks. This number shows the number of chances that various chances are randomly selected. Then, their accuracy is checked with the current position that the fish moves to randomly selected locations.(5)Crowd Factor. A crowd factor is a number that helps us to decide if the number of fishes in the visual spectrum based on that data makes a crowd when the right improvement has been made.

4.2. Dataset

The experiments are conducted on a benchmark American college football team dataset. The dataset is composed of 12 college teams [24]. The dataset represents the games played between college teams during the fall of 2000. The nodes have values that indicate to which conferences they belong. The hubs are the groups, and the edges are the diversions among the groups. Every hub is doled out a hub ID (running from 0–114) with 616 edges (are matches between two different teams) and a conference ID (ranging from 0–11). Figure 1 shows the matches played with various teams in the form of edges connecting nodes. The image is based on real clusters which existed on the ground during the tournament.

4.3. Evaluation Parameters

The performance of the proposed algorithm is evaluated with two fitness functions, i.e., Q-modularity [36] and normalized mutual information (NMI) [37]. The proposed SiFSO algorithm is compared with state-of-the-art multiobjective discrete particle swarm optimization (MODPSO) [33], modified particle swarm optimization (MPSO) [1], and multiobjective discrete particle swarm optimization based on network embedding (NE-PSO) [2]. Both fitness functions test the clusters’ efficiency and accuracy created through any complex network community detection techniques.

Network or Q-modularity evaluates the precision of cherished detected communities. The quantitative concept of modularity can be the fraction of edges that fall in the cluster or community minus the edges’ predicted or estimated value. In contrast, the edges move randomly in a network independent of the group structure. The modularity Q is defined as follows:where ls shows the total number of edges connected to the vertices in the cluster of s, ds represents the sum of degrees of all nodes in s, and m shows the total number of edges in the selected network.

Normalized mutual information (NMI) is used to measure the similarities between the real network community and the community detected by the proposed algorithm. Consider two different partitions, A and B, of the same network detected by two different methods. Let partition A have R number of communities and partition B have D number of communities. A confusion matrix C is defined when the entry Cij represents the number of nodes within both communities. Mathematically, the normalized mutual information between A and B is defined in as follows:

5. Results and Discussions

Efficient community detection in a complex network is considered an interesting issue due to its vast applications in many prevailing areas such as biology, chemistry, linguistics, and social sciences. There are several algorithms available for network community detection. In this research, we proposed the Sigmoid Fish Swarm Optimization (SiFSO) algorithm to discover efficient network community detection. The proposed SiFSO algorithm’s performance is tested against other well-known swarm optimization algorithms, MODPSO, MPSO, and NE-PSO, in terms of two fitness functions, namely, Q-modularity and normalized mutual information (NMI). The performance of SiFSO is better due to the addition of sigmoid function to decide fish movement.

This section discusses the results of the performance evaluation experiments we conducted with SiFSO and comparison with other selected swarm optimization algorithms’ performance. For experimentation, we used a benchmark American college football network dataset that comprised 115 nodes and 616 edges in 12 communities of the American college football network. The vertexes show the teams and the edges. This network’s existing populations are shown in Figure 2.

The results show that the network created through SiFSO is more superficial and better than the other selected algorithms in terms of fitness functions and community discovery. According to the chosen fitness function results, the normalized mutual information value achieved by MODPSO, MPSO, and NE-PSO on a given dataset is 0.8616, 0.9803, and 0.9096, respectively. In contrast, the communities detected by the proposed SiFSO algorithm achieved 0.9803 normalized mutual information value. Though the attained NMI value of the proposed SiFSO is equal to the MPSO, it is still the maximum NMI value gained by other swarm optimization algorithms. The results of the NMI value of the selected algorithms are shown in Figure 3. The results of SiFSO are 0.0014% better in terms of Q-modularity and 0.1187% better in terms of NMI than MODPSO. Similarly, the performance of SiFSO is 0.0846% better in terms of Q-modularity in MPSO. This improvement in terms of fitness functions gives an edge to SiFSO over other selected algorithms.

Similarly, according to the selected fitness functions’ results, the Q-modularity achieved by MODPSO, MPSO, and NE-PSO on a given dataset is 0.6032, 0.52, and 0.5825, respectively. In contrast, the communities detected by the proposed SiFSO algorithm achieved a maximum of 0.6046 Q-modularity, which is the maximum in all selected swarm optimization algorithms. The Q-modularity results of the chosen algorithms are shown in Figure 4, while Figure 5 depicts NMI and Q-modularity values in a 2D scattered plot.

Similarly, the communities detected by the proposed SiFSO algorithm are very much close to the original American football network communities with a minor difference. In contrast, communities detected by other selected swarm optimization algorithms have too many mismatch nodes in each cluster. The communities detected by all selected algorithms and the proposed algorithm are shown in Figure 6. Figures 6(a)6(c) exhibit the communities discovered by MODPSO, MPSO, and NE-PSO that have too many mismatch nodes as compared to SiFSO (shown in Figure 6(d)).

6. Conclusions

Efficient community detection in a complex network is considered an interesting issue due to its vast applications in many prevailing areas such as biology, chemistry, linguistics, and social sciences. There are several algorithms available for network community detection. In this research, we proposed the Sigmoid Fish Swarm Optimization (SiFSO) algorithm to discover efficient network communities. In this algorithm, we introduced the sigmoid function for various fish moves in a swarm, including Prey, Follow, Swarm, and Free Move, for better movement and community detection. The proposed SiFSO algorithm’s performance is tested against other well-known swarm optimization algorithms, MODPSO, MPSO, and NE-PSO, in terms of two fitness functions, namely, Q-modularity and normalized mutual information (NMI). The results showed that the proposed SiFSO algorithm is better than the other selected algorithms in terms of both NMI and Q-modularity.

Moreover, the proposed SiFSO algorithm’s communities are very close to the original American football network communities with a minor difference. The results show that the Sigmoid Fish Swarm Optimization algorithm can effectively detect important communities in various networks. These communities can be used for more discoveries in many prevailing scientific areas such as proteins and drugs, social media and advertisements, user profiles, and financial fraud. In the future, we are interested in improving the network community discovery process by employing more efficient nature-inspired algorithms such as Killer Whale Algorithms (KWA) and use them in various areas such as financial fraud detection.

Data Availability

The dataset used is available online at http://cs.binghamton.edu/∼mrldata/Network%20Data%20Sets.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by Islamia College, Peshawar, KP, Pakistan.