Particle Swarm Optimizer with Aging Operator for Multimodal Function Optimization

This paper proposes a new scheme for preventing a Particle Swarm Optimizer from premature convergence on multimodal optimization problems. Instead of only using ﬁtness evaluation, we use a new index called particle age to guide population towards more promising region of the search space. The particle age is a measure of how long each particle moves towards a better solution. The main novelty of the proposed method is to let each particle learn from not only neighbours with better ﬁtness values but also the neighbours whose ﬁtness values are updated more frequently. To achieve this, we design a comprehensive age-based learning strategy, in which age is used for excluding old particles, selecting learning exemplars and deciding mutation strength and inertial weight for each particle. Experiments were conducted on 15 multimodal test functions to assess the performance of this new strategy in comparison with 7 state-of-the-art PSOs from the literature. The experimental results show the good performance of the proposed algorithm in solving multimodal functions when compared with several existing PSO variants.


Introduction
Particle Swarm Optimizer (PSO) has been shown to be very successful in solving complex and challenging optimization problems 1,2 . However, a common problem often experienced when applying PSO to multimodal function optimization is that the particle population loses diversity too rapidly before it converges to some reasonable solutions 3,4,5,6,7 , which is commonly referred as premature convergence.
Various schemes have been proposed to enhance PSO to cope with premature convergence. These methods can be divided into two categories, one employing population diversity maintenance and another adopting some mechanism to escape local optima. More specifically, methods for diversity maintanance include those using parameter adjustment 8,9,10 , mutation 3,11,12 , improved topology structure 5,7,13,14 and multi-population 4,13,15,16 . Although these works keep a reasonable balance between diversity and convergence, it is still far from optimal. When this occurs, sometimes the only choice is to restart 17 . Restarting strategy does help to improve the performance of PSO 18 , however one problem is to decide when the entire population needs restarting.
An alternative strategy to restart the entire swarm is to restart part of the population regularly in conjunction with the use of an aging operator 19 . Basically the aging operator works by accumulating its ages, an individual with its age exceeding the maximal age τ is removed from the current population and a new randomly generated individual is inserted. τ is predefined by an user to determine the lifespan of each individual. The aging operator maintains population diversity through consecutive individual replacement during the search process. Compared to restarting the entire population, one advantage of aging operator is that new individuals coexist with old population and can learn useful information from them. It was previously demonstrated that aging operator can achieve performance improvements that restarts cannot 20 . In the past decade, the aging operator has been employed in many evolutionary algorithms (EAs) to control ways of how a population is generated 21,22,23 . Two typical aging operators are evolutionary aging 24 and static pure aging 21,22 . Jansen's work 25 shows the static pure aging can help escape from local optima and evolutionary aging is more effective on optimize functions with plateau .
The aim of this paper is to propose a new agebased search strategy to improve the performance of PSO for solving multimodal problems. We propose four core age-based operators, which are particle replacement, neighbour selection, hypermutation and inertial weight adjustment. This new algorithm differs from the existing age-based EAs and PSOs in the following aspects.
(1) We propose a new and PSO-oriented age definition that can be used for detecting both fitness stagnation and oscillation.
(2) The age index is used not only for excluding particles but also for constructing a comprehensive age-based learning strategy.
(3) Instead of the gbest population topology, an age-based population topology is used to select neighbours for each particles, to improve the performance of PSO on multimodal problems in particular.
(4) A new age-based hypermutation is developed to decide the mutation strength for selected particles according to their age.
(5) A new age-based adaptive scheme is used to dynamically determine the inertial weight for each particle. In this scheme, particles with different ages have different inertial weights.
The rest of this paper is organized as follows. The related work about PSO and aging operators are reviewed in Section 2. In Section 3, we investigate two important phenomena of PSO to explain why the proposed definition of particle age is useful. Then, we present the framework of the proposed age-based learning strategy in Section 4. Experimental results are shown in Section 5. The discussions and conclusion are presented in Section 6.

PSO
PSO was first proposed by Kennedy and Eberhart in 1995 26 . Each particle i is composed of two vectors that are position vector x t i = x t i,1 , x t i,2 , ..., x t i,D and velocity vector v t i = v t i,1 , v t i,2 , ..., v t i,D , where D denotes the dimension of search space and t is the generation. Each particle's personal best position (pbest) p t i = p t i,1 , p t i,2 , ..., p t i,D and neighbor's best position (nbest) p t n = p t n,1 , p t n,2 , ..., p t n,D are used for communicating information about the good positions in the search space to each particle. x t i and v t i are updated as follows 9 : where ω is called inertia weight and c 1 and c 2 are acceleration constants. r 1 and r 2 are two random numbers in the range [0, 1]. The ω dampens the influence of previous velocity to the current velocity and can be interpreted as the fluidity of the medium where each particle moves 27 . c 1 and c 2 control the weight of stochastic acceleration terms that pull each particle toward pbest and nbest, respectively 5 . To better control the velocity, a positive value V max is used to clamp each particle's velocity on each dimension within [−V max ,V max ]. According to population topology structure, there are two kinds of PSO, global best version PSO (gbestPSO) and local best version PSO (lbestPSO). In gbestPSO, each particle chooses the global best position of the current population as its nbest. In lbestPSO, each particle selects the best position in its neighbourhood defined by local population topology as nbest.

Diversity maintenance methods in PSO
Many works have been reported on the issue of maintaining population diversity of PSO and some of the representative works are reviewed here. PSO's search behaviour is highly influenced by two parameters, i.e. inertial weight and acceleration constants, so the easiest method used to keep effective diversity during search may be adjusting the two parameters dynamically. For example, a timedecreasing inertia weight 9 , different inertial weight for each particle 28 , time-varying acceleration constants 8,10 . Another important parameter is population size. For example, adjusting population size according to the status of the global best position 29 . In addition, incremental social learning is also introduced to construct a growing population size 6 . Besides these learning parameter, population topology also plays an important role in maintaining diversity. Many static lbest topology have been proposed to improve PSO's performance on multimodal functions, such as ring topology 16 , von Neuman topology 30 and fully informed structure 31 . Eberhart and Kennedy 32 has shown that a static lbest ring topology converges more slowly than gbest topology but performs better on multimodal functions. Recently, some dynamic topologies have been developed, for example random topology 13 , comprehensive learning 5 and orthogonal learning 7 . Some effective diversity maintaining methods used in other EAs are also introduced to PSO. Mutation is one of them. Several mutation strategies have been proposed, such as Gaussian mutation, Cauchy mutation and hypermutation 12 . Be-sides mutation, multi-population methods have also been introduced to PSO for multimodal function optimization. The first category of multi-population method in PSO evolves multiple subpopulations in parallel and each subpopulation searches a different region of the landscape. This method is often used to search the landscape of a multimodal function or deal with dynamic optimization problems. Clusterbased PSO 33,34 and niching PSO 16,35 are two representative multipopulation based methods. Another multipopulation based method is cooperative coevolution PSO, which splits solution vectors into multiple smaller vectors and each of these smaller search spaces is searched by a separated population. Bergh and Engelbrecht 4 introduced cooperative coevolution into PSO, then Li and Yao 36 improved it to optimize large-scale continuous problems.

Aging operators
If premature convergence happened, the only way is to restart the entire population to search once again till the terminal condition is achieved. Instead of restarting all individuals, aging operator selects part of individuals to restart every time according to the age of an individual. In a way, aging operator can be seen as a steady-state version of the restart method.
Most of the current aging operators are used to remove particles with poor performance. Two frequently used aging operators are evolutionary aging 23,37,24,38 and static pure aging 22,21,39,40 . The evolutionary aging is often used in EAs and the static pure aging is used in artificial immune systems(AISs). What is in common between these is that each individual with its age exceeding maximal age τ is replaced by a new randomly generated individual. The main difference between the two aging operators lies in the definition of an individual's age. In the evolutionary aging, each new offspring generated through crossover or mutation is assigned age 0 and the age of each remaining individual is increased by one in each generation 23,24 . The parent individuals are included in the remaining individuals if the parents can exist with their offspring 24 . In the static pure aging, each new offspring is assigned age 0 only if its fitness is better than its parents fitness, otherwise it inherits its parents age 21 . It is obvious that the in-dividual age defined in evolutionary aging does not depend on individual fitness but measures how long the individual survives. In contrast, the age in static pure aging represents how long an individual has not reproduced a better offspring. For a minimization problem, the definition of age in the two aging operators are as follows 25 : Definition 1 (evolutionary age) y.age = 0. Definition 2 (static pure age) if f (y) < f (x) then y.age = 0 else y.age = x.age.
where x and y represent parent individual and child individual, respectively. Jasen's recent work 25 compares the above two aging operators and shows that: the static pure aging can recognize local optima while the evolutionary aging fails. On the other hand the evolutionary aging is able to optimize functions with plateau but the static pure aging can't.
To improve the performance of static pure aging on plateau, a new aging called genotypic aging was also proposed 25 . The age in genotypic aging is defined as follows: Definition 3 (genotypic age) if f (y) f (x) ∧ y = x then y.age = 0 else y.age = x.age Different from static pure aging, offspring with the same fitness but in different place with respect to its parent, is assigned age 0 in genotypic aging. It is obvious that this mechanism allows for the random walk on plateau. Besides the above three agings, Hornby 41 proposed a new aging, called the agelayered population structure (ALPS). In ALPS, randomly created individuals start with age 0 and the age is then increased by 1 in each generation if an individual is used for producing an offspring. Individuals created through mutation and crossover start with age 1 plus the maximal age of their parents.

Fitness stagnation and oscillation
Before defining new age for PSO, let us first consider the drawback of the original PSO, which will be used to explain why the existing age definitions are unsuitable for PSO. To do this, consider the original PSO with gbest topology for multimodal funtion minimization. In Eq.(1) and (2), we eliminate the velocity term and transform them to the format as follows: where φ 1 = c 1 r 1 and φ 2 = c 2 r 2 . A particular solution of this second-order difference equation Eq. (3) is: Furthermore, existing work has proved the following equation under certain conditions 42 : This means that, particle i will converge to or oscillates around an equilibrium point E(o t i ), which is a weighted average of p t i and p t g if t is big enough. In the initial stage, due to the |gbest − x| is significantly larger than |pbest − x| , each particle is attracted toward gbest position because its big influence. As search continues, the gap between |gbest − x| and |pbest − x| is reduced gradually, the velocity of each particle becomes small. Under this circumstance, if gbest and pbest are on the same valley of fitness landscape, the particle is still attracted to move toward gbest direction. However, it brings a problem that is if the gbest is a local optima, the particle may be impossible to jump out of this local optima once its pbest has moved into the same area with the gbest 5 . We call this fitness stagnation, because all individuals are trapped in the same valley or peak and cannot jump out. Another situation is that if the fitness of gbest and pbest are very close and they are in two different valleys of the function landscape, they would make the particle oscillate 16 between them. Oscillation is also regarded as a kind of premature as gbest and pbest are not changed during oscillation. This is called fitness oscillation.
When the fitness stagnation takes place, both the particle's fitness value and its pbest fail to further improve. However, when the fitness oscillation happens, particle's fitness value is fluctuated but its pbest dose not change. So all the above three existing aging operators fail to detect fitness oscillation because no pbest information is used in them.

Particle age
To recognize the fitness oscillation of PSO, we propose a new definition of age for particles in PSO. The new age is defined as follows: i .age = 0 else x t+1 i .age = x t i .age + 1 In particle age, each new particle begins from age 0 and the change of its age depends on whether it can find a better pbest after one iteration. If a particle find a better pbest to replace the current one, its age becomes 0. On the other hand, if it fails to update its pbest, its age is increased by 1 in this iteration. A distinct feature of particle age is that it utilizes particle's best historical information and particle's current information to determine the age for each particle. It can recognize the two kinds of premature convergence, because no matter which of them happens it is impossible to change a particle's pbest. Thus it causes a linear increase on age of the particle. When the sharp increase is detected, some remedial measures, such as adjusting parameters and restarting, can be utilised to allow the particle to escape from the current local optima. An example of the change of particle age during optimization is shown in Fig.1. The objective particle was randomly selected from 40 particles used in a basic PSO for optimizing 30-D Rastrigin function 5 . From Fig.1, this particle was relatively easy to update its pbest at the earlier stage of search but difficult to find a better pbest after about 1500 generations. This may be caused by the lose of population diversity after certain number of generations. Furthermore, this tendency appeared on other multimodal functions such as Rosenbrock, Griewank, Ackley, Weierstrass and Schwefel function 5 when the original PSO was used.

Framework of PSOA
As defined above, the age of each particle shows whether the space around it is a promising region for itself or other particles, in order to search for better solutions. A too old particle means that the direction it lies is hopeless and we do not hope any particles search toward it. In other words, it is possible for population to find a better solution if more particles are used to search around a particle whose age is zero. According to this basic guideline, we propose the following three learning principles: First, any particle whose age exceeding the permitted maximal age is replaced by a new random particle. Second, each particle learns information from the younger particles. Third, to avoid being replaced, a high probability should be assigned to the older particle so that it can update its pbest. PSOA starts from a randomly generated initial swarm with a random topology. The framework of PSOA is given in Algorithm 1. Four main components of PSOA, i.e. particles replacement, neighbours selection, hypermutation and inertial weight adjustment, which are the four age-based operators are described in the following sections.

Particle replacement
The original aging operator that excludes particles with their age exceeding τ is used in PSOA to manage population dynamically. We call it particle replacement in this paper. At the end of each generation, the age of each particle is computed according to the particle age defined in Definition 4. Specifically, if particle i find a new position that is strictly better than its current pbest its age α i is changed to Algorithm 1 The framework of the proposed PSOA Initialization: 1: Randomly generate an initial swarm S with size P; 2: Set the fitness evaluations counter f evals = 0; 3: Set the age of initial swarm α i = 0(i = 1, 2, ..., P); Iterations: 4: while stop condition is not satisfied do 5: if regular interval achieves then for i = 1 : P do 10: InertiaWeightAd justment(S, ω, α);

11:
Update velocity and position according (1) and (2) ; 12: end for 13: if mutation condition is satisfied then 14: Hypermutation(S, a); 15: end if 16: FitnessEvaluation(S, f evals); 17: Update age according the Definition 4 18: end while . When the latter happens, α i will be compared with τ. If α i > τ, the particle i is excluded from the current population. Then, a new particle with age 0 is randomly generated in the search space and inserted into population. The above process is executed on each particle in the population.

Neighbour selection
Population topology structure has significant influence on the performance of PSO for multimodal function optimization. Instead of using gbest topology, an age based population topology with K neighbours is proposed in this paper. In this age topology, each particle first randomly selects K particles in population as its neighbours then chooses the particle with the best fitness among the K neighbours as its nbest. At regular intervals, this procedure is repeated to select new neighbours for each particle. One important question is how to select the K neighbours from the population. In existing work 13 , the K neighbours for each particle is randomly selected from the entire population. In PSOA, each particle selects ones that not older than itself as its neighbours. One benefit of this strategy is improving the chance to share information with good particles and reduce the probability to search on poor directions. Specifically, we select neighbours for particle i as follows.
(1) We first find out all particles except i with their age not exceeding the age of particle i (α i ). The number of particles is N.
(2) If N K, randomly choose K particles from the N individuals as the neighbours of particle i. If N < K, choose the global best particle of current population as its only neighbour.
(3) If N K, compare the fitness value of these K particles' pbest and select the best one as the nbest of particle i.
This procedure is executed at every age-gap generations. For example, if the value of age-gap is 15, the topology of each particle is reconstructed at gen- (b)the length after the hotspot is shorter than l i each particle to learn more information from good neighbours, the age-gap is set to the maximal age τ in this paper. The main difference between the proposed topology and other random topology 13 is that it uses age to decide the constitution of neighbourhoods but not randomly selects them from the entire population. Since age represents the ability of a particle to update its pbest, searching around young particles may provide more promising solutions.

Hypermutation
Hypermutation is a basic mechanism of cell immune response and has been simulated in artificial immune systems (AISs) for machine learning and optimization 43 . Its distinct feature is that it makes all dimensions in a contiguous region of vector to be mutated. Our preliminary work 12 implemented this idea in PSO and found its good performance on multimodal problems. Instead of using a random mutation strength as in 12 , we use the particle age to determine the range of the contiguous region. We hope the older particles have the higher mutation strength to push them to a new position, since they have stagnated or oscillated for a long time.
For particle i with age α i , if it is selected to be mutated, its mutation length l i is as follow.
In (6), D is the particle dimensions and is ceiling function. If α i = 0, only one dimension of particle i is chosen to be mutated and this is same with one point mutation. With the increase of α i , the mutation length l i is also increased from one to the mutation length is D/2, which is the highest mutation strength allowed in PSOA. In this paper, the mutated particles are not considered as new particles because at least half of these particles' dimensions information are retained. As a result, the age of them remain unchanged. For particle i, a hotspot (mutation point) is first randomly selected within the dimension of it to implement the hypermutation. If the distance between hotspot and the end of the vector is longer than l i , then all l i dimensions that from the hotspot onward are mutated ( Fig.2(a)). If the distance between them is shorter than l i , the remainder dimensions are used from the beginning of vector i ( Fig.2(b)).

Inertia weight adjustment
Although particle replacement can discard aged particles and introduce new particles to replace them, it may lose useful information in the discarded particles. Furthermore, adding new particles needs additional fitness evaluations. In order to make the best use of the existing particles, it may be better to keep them through finding better pbest. So, it is reasonable to equip the old particles with better local search capability. To do this, we build a connection between each particle's age and inertia weight to decrease ω when it gets old. The inertia weight for particle i (ω i ) is changed as follows.
In PSOA, ω is initialized to 0.729 which can keep the search balanced between global best and personal best position 27 . From (7), the ω i of particles with age 0 are assigned 0.729. The ω i is decreased by 1/(τ + 1) when the age of particle i (α i ) is increased by one till the α i is larger than τ. Due to the particle replacement is performed at every age-gap generations, some particles with their age exceeding τ may be kept in population. For these particles, we assign them age 0 to give them the highest local search abilities to update their pbest and they will be excluded in the next iteration if this is failed.

Setting the maximal age τ
To select an appropriate τ, six 30-D multimodal functions are used to investigate the influence of τ.
The six function are Rosenbrock, Griewank, Ackley, Rastrigin, Weierstrass and Schwefel function 5 . First, we use the basic gbestPSO to run it 30 times on all functions. Table 1 shows the average and standard deviation of the age of the entire population during optimization process. The data before the termination of the PSOA is used for statistical analysis. It is obvious that the population ages in the six functions are very close and their mean value is about seven. It means that each particle spends average about 7 iterations to find a better pbest when dealing the six functions. Therefore, the lower limit should be larger than seven to leave enough time for most particles to search. Based on this, we run PSO with particle replacement on the six functions using 15 different τ from 7 to 21. The experiments is also run 30 times on each level and the average of the final fitness value are plotted in Fig.3. Four test functions, including Ackley, Rastrigin, Weistrass and Schwefel functions, are very sensitive to τ and three of them achieve the best results when τ is 19. The other two test functions, i.e. Rosenbrock and Griewank functions also get good result when τ is around 19. So, the maximal age τ is set at 19 for all benchmark functions used in this paper.

Experiments Results
The experiments conducted in this paper are divided into two parts: the first part is to investigate the respective influence of the four proposed schemes to PSO and the second part compares PSOA with other well-known PSOs.

Test functions
In order to test the PSOA on multimodal functions and compare it to other algorithms, we choose fifteen widely used multimodal test functions from 5,44,45 . The fifteen test functions are listed in Table  2 (2)), c 1 = c 2 = 0.5 + ln(2), K = 3. separable, an orthogonal matrix M is used to rotate coordinate. The original vector x is multiplied by M to generate a vector y = M * x. In this paper, the orthogonal matrix M is generated through Salomon's method 5,29,47 . Table 2 also shows the search range of each dimension of the particle (column 2), the global optimal fitness f min (column 3) and the acceptable accuracy for each function (column 4). In our following experiment, the initialization range is the same with the search range. The accuracy index 7 is used to measure the desired accuracy for each function.

Experimental setup
Experiments will be conducted to compare PSOA with seven existing PSO variants on the fifteen test functions. Table 3 lists the algorithms name and parameters configuration. All parameter settings are based on the suggestions in the corresponding references. The first PSO is Gaussian PSO (GPSO) where a Gaussian mutation is used 48 . The second is Clerc's standard PSO 2007 version (SPSO07) 14 , where a random topology is used. The third one is fully informed particle swarm (FIPS) 31 where each particle is related to all other particles of the population. The fourth one is dynamic multi-swarm PSO (DMS-PSO) 13 where a dynamic random topology is applied to organize learning among particles. The fifth one is comprehensive learning PSO (CLPSO) 5 that uses all others personal best position to update a particles velocity. The sixth one is incremental PSO (IPSO) 6 which uses the incremental social learning to control the population size dynamically. The last PSO for comparisons is the efficient population utilization strategy PSO (EPUS-PSO) 29 in which a dynamic population size is also employed.
All algorithms were programmed and compiled under Matlab R2011b on an Intel Core i5-760 2.8GHz computer running Microsoft Windows 7. In the following experiments, the population size is set to 40 for all algorithms except IPSO and EPUS-PSO. In IPSO and EPUS-PSO, the minimum population size of both is set to 1 and the maximum population size is set to 1000 and 20 respectively, according to their original setting in 6,29 . Furthermore, all the algorithms uses the same maximum number of fitness evaluations (FEs) 2.0e+05 in each run for all functions. In experimental setup, each algorithm is run with the same parameter setting across all test functions. To verify the effectiveness of the proposed algorithms, each algorithm is run 30 times independently for every function and the mean value and standard deviation value are calculated. Besides, two additional indexes success rate(SR) and convergence speed(CS) are used for algorithm comparisons. The SR index is the percentage of runs that reach the desired accuracy and the CS is measured on the mean number of FEs required to reach an acceptable solution among successful runs 7 . Note that, in the all following experimental results tables,the better results on each index are in bold.

Effects of the proposed strategies
In this section, we investigate whether the proposed strategies, i.e. particle replacement, neighbours selection, hypermutation and inertia weight adjustment, can significantly improve PSO 's performance on multimodal function. To do this, we add each of the three ideas to basic PSO one by one to produce four different algorithm as shown in Table 4. We named them PSOA, PSOA1, PSOA2 and PSOA3 respectively. Then we run basic PSO, PSOA and the three PSOA variants on the fifteen test functions listed in Table 2, 30 times independently.  Table 5 compares the experimental results of the five algorithms. The better results on each test function are highlighted in bold. It is noticeable that the performance fluctuates a lot when different aging operators are used. Firstly, the basic PSO gave better results than any PSOA variants on f 1 . For PSOA, there was no significant enhancement on all test functions compared to basic PSO though particle replacement was used in it. On the contrary, PSOA using an age based random topology (PSOA1) got a better performance than PSO and PSOA on most functions except f 1 and f 7 . In addition, PSOA1 gave the best results on the function f 15 . Furthermore, PSOA2 gave a better solution than PSOA1 on all functions except f 13 -f 15 . On the function f 8 , f 10 , f 11 and f 12 , PSOA2 achieved the best results among the five PSOs. Finally, PSOA3 achieved the best accuracy on the function f 2 − f 7 , f 9 , f 13 and f 14 . Table 5 also compares the t-test results among the five PSOs. The first group of t-test is between PSOA and PSO, it can be noted from the results (B vs A column ) that PSOA performed worse than PSO on all of the fifteen functions. It is obvious that PSOA1 was significantly better than PSOA on 12 out of the 15 benchmarks. Furthermore, PSOA2 outperformed PSOA1 on nine functions. From results of the last group (PSOA3 vs PSOA2), PSOA3 performed significantly better than PSOA2 on ten functions. Table 6 compares the success rate and convergence speed of PSO and the four PSOA variants. On success rate measure, the basic PSO only performed well on the function f 1 and the PSOA totally converged to the desired accuracy on the function f 1 and f 12 in the 30 runs. Although the average success rate of PSOA1 (55.3%) was significant better than PSO and PSOA, it only gave 100% success rate on the function f 3 and f 12 . Compared to PSOA1, PSOA2 showed better results of success rate on nine test functions, i.e. f 1 − f 5 and f 8 − f 11 . Finally, PSOA3 got the best average success rate (75.3%) among the five PSO variants and was converged on nine test functions. Another measure is about the convergence speed in the successful runs. On this measure, PSO, PSOA and PSOA3 gave better results on one, four and seven test functions, respectively.
From the statistical results shown in Table 5, it seems that the particle replacement can not improve the performance of PSO. This is understandable because there is no additional mechanism to allow other trapped particles to learn the information in the new ones. On the contrary, the new particles are easily attracted by these particles in local optima. This conjecture is verified in the results of PSOA1. When the age-based topology is added to PSOA, its solution accuracy, converge speed and success rate on most test functions were all enhanced substantially. This is because the new born and young particles can provide more promising information than the gbest. More important, the age-based population topology provides the chance for trapped particles to learn these information and escape from local optima. Fig.4 compares the change of population diversity and fitness value when applying PSO, PSOA and PSOA1 on Rastrigin functions f 4 . The results are the average value of 30 independent runs and the diversity is computed according to Morrison and De Jong's moment of inertia diversity measure 49 . From Fig.4, it is clearly observed that PSOA failed to maintain diversity during search but PSOA1 was able to keep an effective diversity in a high level. It is noted that the similar tendency happened on all of the other fourteen test functions.

Comparisons with other PSOs
In this section, the PSOA2 and PSOA3 are compared with other seven improved PSO variants, which are listed in Table 3. Table 7 shows the average value and standard deviation results of the nine PSOs. PSOA2 gave the better results on f 8 , f 10 , f 11 and f 15 . PSOA3 performed the best on the eight out of the fifteen functions, i.e. f 2 − f 5 , f 7 , f 9 and f 13 − f 14 . FIPS got better results on the function f 12 and CLPSO performed very well on the function f 1 and f 6 . Overall, it can be observed that PSOA2 and PSOA3 gave the better results on most of the test functions. Table 8 lists the t-test results at the confidence level of 5% between PSOA3 and the other seven PSO variants (except PSOA2). "+" and"-" indicate that PSOA3 is significantly better and worse than the compared algorithm, respectively. "≈" indicates the difference is not significant. From Table  8, it is obvious that PSOA3 was significantly better than all the other seven PSO variants on the function f 2 , f 4 , f 5 , f 8 and f 15 . On the function f 3 , f 9 , f 12 and f 13 , PSOA3 outperformed six out of the seven compared PSOs. On the function f 1 , f 10 and f 14 , it beat five out the seven algorithms. In addition, it performed better on four of the functions on the function f 7 and f 11 . Finally, it performed poorly on the Schwefel function f 6 . From another perspective, for GPSO and DMS-PSO, it performed significantly better than them on most of the test functions and worse than them on the function f 6 . For SPSO07, IPSO and EPUS-PSO, they failed to performed well on any of the test functions. For FIPS, the proposed algorithm was better than it on twelve out of the fifteen functions but defeated on the function f 6 and f 12 . For CLPSO, the proposed algorithm also beat it on twelve test functions and performed bad on three functions, i.e. f 1 , f 6 and f 11 . Table 9 compares the success rate and converge speed in the successful runs of the nine PSO variants. The results show that PSOA2 and PSOA3 achieved the highest average success rate 70.7% and 75.3% respectively, DMS-PSO ranked second at 62.7%, followed by CLPSO, SPSO07, GPSO(FIPS), IPSO and EPUS-PSO. Moreover, PSOA3 converged very fast on most of the functions and gave the best performance on the function f 1 , f 4 , f 5 and f 8 − f 10 . GPSO was fastest on f 6 and f 12 and SPSO07 performed fastest on f 13 . In addition, IPSO is very fast on f 3 and EPUS-PSO performed well on f 2 and f 11 . Fig.5 and 6 show the convergence plot of PSOA3, DMS-PSO, CLPSO and SPSO07, which all performed very well on the fifteen test functions in the seven compared PSOs. It can be noted that PSOA3 converged fast to the best solution on most of the test functions, especially on f 2 − f 5 , f 8 , f 9 , f 12 and f 15 . Overall, compared to the other seven PSOs, PSOA2 and PSOA3 show better performance both on the success rate and the convergence speed.

Conclusion
In this paper, we presented a novel PSOA to cope with premature converge when solving multimodal functions. Firstly, we gave a new definition of age for PSO, which used the update of pbest of each particle to determine its age. One useful feature of particle age is that it can recognize both fitness stagnation and fitness oscillation. Based on this, we first introduced the original aging operator (particle replacement) into PSO. However, many valuable information in the new and young particles were not fully utilized. To make use of these information, three age related operators were proposed in this paper, i.e. age based neighbourhood selection, hyper-        Experiments were conducted on 15 multimodal test functions. From the results, we can summarize some distinct features of PSOA variants as follows.
(1) It is found that the isolated particle replacement operator fails to improve PSO significantly on both population diversity and converge accuracy. This is because the added particles are easily attracted by gbest particle if there is no other strategy to stop them. On the other hand, the particle replacement operator is necessary for the proposed strategy because it can provide the enough age diversity that is the foundation of the other three operators.
(2) Age based neighbours selection is able to maintain effective population diversity and improve convergence results. From the particle age definition, the young particles are more promising than the old ones. Selecting particles with same or smaller age as neighbours distinguishes it from the other neighbourhood topologies.
(3) Age based hypermutation and inertial weight  decreasing give the older particles more chances to generate better pbest so that they can avoid to be discarded. Older particles play a very significant role in retaining effective particles and thereby reducing fitness evaluations that should have to be computed. In addition, hypermutation makes PSOA insensitive to the different initialization condition.
(4) The proposed three PSOAs are less sensitive to coordinate rotation and shift than the other seven compared PSOs. However, they may not be efficient in optimizing nonseparable problems with deep local optima far from the global optima, such as the function f 6 .
It can be seen that the proposed three aging operators can prevent particles from getting stuck by the local optima, maintain effectively population diversity, guarantee robustness and improve performance of PSO. In future, we will apply the proposed algorithm to solve some real-world optimization prob-lems, such as data clustering problems and image segmentation problems.