An Optimal Task-Scheduling Strategy for Large-Scale Astronomical Workloads using In-transit Computation Model

The Sloan Digital Sky Survey (SDSS) has been one of the most successful sky surveys in the history of astronomy. To map the universe, SDSS uses their telescopes to take pictures of the sky over the whole survey area. Now the total SDSS data volume is larger than 125 TB since every night telescopes produce about 200 GB of data. To improve the processing efﬁciency of such large-scale astronomical data, we develop an optimal task-scheduling strategy by using in-transit computation model under fog computing. Within the proposed strategy, we design a global optimization technique to derive an optimal load distribution among heterogeneously computational resources. Finally, we conduct various experiments to illustrate the correctness and effectiveness of the proposed strategy. Experimental results show that it can signiﬁcantly decrease the processing time of large-scale workloads.


Pl e a s e n o t e:
C h a n g e s m a d e a s a r e s ul t of p u blis hi n g p r o c e s s e s s u c h a s c o py-e di ti n g, fo r m a t ti n g a n d p a g e n u m b e r s m a y n o t b e r efl e c t e d in t his ve r sio n.Fo r t h e d efi nitiv e ve r sio n of t hi s p u blic a tio n, pl e a s e r ef e r t o t h e p u blis h e d s o u r c e.You a r e a d vis e d t o c o n s ul t t h e p u blis h e r's v e r sio n if yo u wi s h t o cit e t hi s p a p er.
Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s. S e e h t t p://o r c a .cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s.Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .

Introduction
For millennia, twinkling stars in the night sky have always inspired our curiosity about the universe.Astronomers have launched various scientific sky surveys in the last century attempting to map the universe, over ever-larger areas, to ever-greater depths, and over an ever-increasing range of wavelengths.Among these surveys, the Sloan Digital Sky Survey (SDSS) 1 has created the most detailed three-dimensional maps of the universe ever made, with deep multi-color images of more than one third of the entire night sky, and spectra for more than three million astronomical objects.
SDSS has progressed through several phases.In its first five years of operations, SDSS-I (2000-2005) carried out deep multicolor imaging over 8000 square degrees and measured spectra of more than 700,000 objects.With an ever-growing collaboration, SDSS-II (2005-2008) completed the goals of imaging half the northern sky and mapping the 3dimensional clustering of one million galaxies and 100,000 quasars.SDSS-III (2008-2014) undertook a major upgrade of the venerable spectrographs 2 .In July 2014, SDSS-IV was launched.It is an extensive imaging and spectroscopic survey of the Northern and Southern sky, using a dedicated 2.5-meter telescope located at southeast New Mexico and the du Pont Telescope at northern Chile 2 .Each telescope is fixed to point directly up at the sky and images a "stripe" of the sky over the course of night.As the Earth rotates, more of the sky becomes visible above the telescopes.Every night the telescopes produce about 200 GB of data.Now the total SDSS data volume is larger than 125 TB 2 .
Each image taken by telescopes is composed of myriad pixels, each pixel of which captures the brightness of every tiny point in the sky.But the sky is not made of pixels.Data managers for SDSS requires to extract digitized data from images and process the extracted data to produce information they can use to identify and measure properties of stars and galaxies.It is worth noting that scientists must handle the astronomical workloads as quickly as possible because SDSS astronomers need the information to configure their telescope to work most efficiently during the next dark phase of the moon.If too much time goes by, we might miss the immediate next season of the target objects.
Such large-scale astronomical data could not be processed efficiently without network-based computing systems.One of the key issues in networked computation is obtaining an optimal scheduling strategy, including partition and distribution of workload among computational resources, to achieve shortest processing time.An optimal scheduling strategy depends mainly on the network architecture as well as the number of computational resources and their computing capabilities.Mani and Ghose 3 studied the distribution of divisible workload in a homogeneous linear network and derived recursive equations for obtaining an optimal load partition.Later, asymptotic solutions for homogeneous bus networks were obtained 4 .For heterogeneous star networks, Bharadwaj et al. 5 derived a closed-form expression for an optimal load partition to achieve shortest processing time.The task-scheduling problem turns out to be more difficult when practical issues like the computation and communication start-up overheads are considered.Carroll 6 and Ghanbari 7 studied optimal scheduling strategies for bus and tree networks with arbitrary start-up overheads, respectively.Later on, similar studies have been made on a variety of distributed networks, such as Gaussian, mesh, torus networks 8 , complete b-Ary tree networks 9 , heterogeneous clusters 10 , and cloud computing systems 11 .
It should be noted that even the most advanced cloud computing architecture still faces challenge to handle a large amount of astronomical data.Fog Computing is becoming widely known as being the one that extends cloud computing to edge devices and processes directly on the edge devices, thus minimizing the amount of data that is transferred to the cloud 12 .One ubiquitous edge device is network data center.Compared to data centers hosted by cloud providers, network data centers are managed and operated by network providers, which constitute an important part of the current Internet infrastructure.Fog computing can exploit network data centers along the path when workloads are in transit from the user side to cloud data center, so that the spare compute capacities of network data centers could be utilized more efficiently and the processing time of workloads shall be decreased simultaneously.With this idea in mind, Zou et al. 13,14 proposed an in-transit computation infrastructure composed of an ensemble of computational resources, inclusive of a cloud data center and a certain amount of network data centers connecting the source (user) and destination (cloud data center).We employ this in-transit computation model in this work to improve the processing efficiency of astronomical workloads.Our main objective is deriving an optimal task-scheduling strategy for in-transit computation under fog computing so that the processing time of large-scale workloads could be minimized.
The remaining of this paper is organized as follows.Section 2 establishes a novel task-scheduling model for in-transit computation.To solve this model, we accordingly design an effective genetic algorithm in Section 3, which will be evaluated through experiments in Section 4. In the last section, conclusions are obtainable.

In-transit Computation under Fog Computing
In this section, we shall first formally define the taskscheduling problem we address and introduce all the notations and definitions used throughout this paper.
Then we propose a novel task-scheduling model for in-transit computation of large-scale workloads under for computing.

Problem Description
Suppose that an astronomer needs to compute a workload W total , for example searching among astronomical database or analyzing astronomical data for a certain purpose.The location where workload stores is defined as source s, while the remote cloud data center is defined as destination d.Workload W total transfers from source s to destination d through a network path composed of n in-transit net- Note that astronomical data, although large in size, are generally partitionable, meaning that they can be partitioned into any number of fractions, or at least fine-grained fractions, and that there are no precedence relationships among these fractions so that they can be independently processed on distributed compute platforms.Hence, workload W total will be partitioned into (n + 1) fractions and processed by (n+1) fog nodes independently.Note that source s does not participate in workload computation.We can observe from Fig. 1 that after receiving the whole workload from resource s, fog node f 1 , an in-transit network data center, keeps a fraction α 1 of W total for itself and transmits the remain- ing (W total − α 1 ) to its right immediate neighbor f 2 .Similarly, fog node f i keeps a fraction α i of W total for itself and transmits the remaining (W total − ∑ i j=1 α j ) to fog node f i+1 .The last node f n+1 , also known as the destination cloud data center, upon receiving its load fraction α n+1 , does only computation.We have ∑ n+1 i=1 α i = W total and 0 < α i < W total .The total pro- cessing time T is the time at which the entire workload W total has been processed.It is given by the maximum of the finish time of all fog nodes.Thus when all fog nodes stop computing at the same time instant, the total processing time T gets minimized.Source s is assumed to start distributing the whole package of workload W total to fog node f 1 at time t = 0.It takes node f i time {o i + z i × (W total − ∑ i j=1 α j )} to transmit load fraction (W total − ∑ i j=1 α j ) to node f i+1 , and then it cost f i time (c i +w i α i ) to finish computing its assigned load fraction α i .Here o i refers to communication start- up overhead of link l i and c i represents computation start-up overhead of fog node f i , while z i indicates the ratio of the time taken by link l i to transmit a given workload to that by a standard link and w i represents the ratio of the time taken by node f i to compute a given workload to that by a standard compute resource.It may be noted that the computation speed of the last fog node f n+1 (i.e., cloud data center) is much faster than that of in-transit nodes.Hence, w n+1 < min{w 1 , w 2 , • • • , w n }.
Let P i denote the time when node f i finishes transmitting load fractions to node f i+1 .We have where i = 1, • • • , n and o 0 + z 0 W total stands for the transmission time for the total workload distributed from source s to fog node f 1 .For the last node f n+1 , we have P n+1 = P n .Each fog node starts computing only after it finishes transmitting its remaining workload to its immediate neighbor.The finish time of node f i can be written as, Finally, we can obtain the processing time T of the total workload as

In-transit Computation Model
Here we formulate a new in-transit computation model under fog computing. min where subject to: i=1 α i = W total .There are (n + 1) variables involved in this model.Constraints (i) and (ii) indicate that load fractions assigned on fog nodes should be nonnegative and not larger than the entire workload, and that the sum of all load fractions equals the entire workload.

Optimal Task-Scheduling Strategy
In this section, we shall design a Genetic Algorithm (GA) searching for an optimal load partition A = {α 1 , α 2 , • • • , α n+1 } for the proposed in-transit com- putation model.We select GAs to solve our model because GAs have been proven to be a promising technique for combinatorial optimization problems, especially for task-scheduling problems.

Encoding and Genetic Operators
The key point of finding an optimal solution by using GAs is to develop an encoding scheme that can represent the problem to be solved directly while satisfying the problem constraints easily.In this paper, an individual is real coded directly as I = (α 1 , α 2 , • • • , α n+1 ).For a given individual I, if ∃i, α i 0 or ∑ n+1 i=1 α i > W total , then this individual I violates the constraints of the proposed model and it is considered to be an invalid individual.
As a simple example, assume that there are n = 6 fog nodes in the system, inclusive of 5 in-transit network data centers along the path from source to destination (cloud data center).The size of the entire workload is 1000 units.A possible encoding scheme is given as follows: = (94, 78, 60, 68, 50, 650).
We observe that ∀i, α i > 0 and ∑ n+1 i=1 α i = W total = 1000, thus individual I is a valid individual as it satisfies all constraints in our model.It is worth noting that the last fog node f 6 is assigned with the largest load fraction α 6 = 650 > max{α 1 , α 2 , α 3 , α 4 , α 5 }.This is because the last fog node represents the cloud data center with much higher compute capability than other fog nodes (network data centers).This is also validated from our experimental results as shown in Section 4.
According to our proposed in-transit computation model, we have a special constraint as ∑ n+1 i=1 α i = W total .Therefore, if we adopt two-point crossover, it may produce invalid offsprings.Hence, we should normalize the newly generated individuals to ensure that the total value of all genes equals the entire workload W total .
We adopt two-point mutation on offsprings generated by crossover according to a user-definable mutation probability.This probability should be set low; otherwise, the search will turn into a primitive random search.In detail, we randomly generate two integers p and q satisfying that 1 p < q (n + 1), then exchanging genes α p and α q of individual I.It can be expected that offsprings generated by this mutation operator satisfy all of the constraints in our proposed model by default.

Local Search
To accelerate the convergence of the proposed GA, we introduce a local search operator in this paper.The main idea is to transfer proper size of load from the fog node with the longest processing time T max to the one with the shortest processing time T min , so that all of the fog nodes will eventually stop computing at the same time instant.The process of the local search operator is given as follows.
Step 1 For a given individual Step 4 Among all fog nodes  As illustrated in Fig. 2, node f 1 has the longest processing time T 1 and f 3 has the shortest processing time T 3 .Thus f max = f 1 and f min = f 3 .After load balancing between f 1 and f 3 by local search operator, a possible timing diagram is shown in Fig. 3.It can be observed that the time difference between T 1 and T 3 illustrated in Fig. 3 becomes much smaller than that in Fig. 2. Hence, the total processing time of the entire workload would be decreased.

Framework of the Proposed Algorithm
Once encoding scheme is defined, a GA initializes a population of individuals and then improves them through repetitive applications of genetic operators, including crossover, mutation, local search, and selection.Given workload W total , population size Popsize, crossover probability p cros , mutation probability p mut , elitist number E = 5, and stop criterion, the framework of our proposed GA is given as follows.
Step 1 (Initialization) Randomly generate Popsize individuals as initial population Pop(0) according to the encoding scheme.For each I ∈ Pop(0), compute processing time T of workload W total and take 1/T as the fitness value of I. Let generation number t = 0. Step 2 (Crossover) Select Popsize individuals into the crossover pool from Pop(t) by roulette wheel selection.Apply two-point crossover on each pair of parents selected from the crossover pool according to p cros and then normalize the newly generated offsprings to ensure that ∑ n+1 i=1 α i = W total .All offsprings constitute a set denoted by O 1 (t).
Step 3 (Mutation) Apply two-point mutation on each of the selected individuals from O 1 (t) according to p mut .All newly generated offsprings constitute a set denoted by O 2 (t). Step

Experimental Results and Analysis
As we mentioned earlier, every night the telescopes of SDSS, including the primary 2.5m telescope, 0.5m photometric telescope, and 10 micron all sky scanner, produce about 200 GB of raw imaging data.
In our simulation, we have considered this actual data size and normalized it into 10000 units.Then a series of operators are required to process these large-scale telescope imaging data under fog computing, ultimately producing a variety of products including images with instrumental signatures removed, a photometric solution for the night, and a catalog of objects found in the data.The computation speed of each fog node processing every unit of astronomical data is recorded in Table 1.
In each run of our proposed GA, the following parameters are set: Popsize = 100, crossover probability p cros = 0.8, mutation probability p mut = 0.02, elitist number E = 5, and stop criterion t = 2500.

Correctness Evaluation
We conduct two experiments to validate the correctness of our proposed GA.In each experiment, we employ a fog computing system with 20 fog nodes, including 19 in-transit network data centers and one cloud data center.In the first experiment, we fix system parameters as given in Table 1 and vary the workload size from 500 to 2500 units.Figs. 4 and 5 collect the experimental resutls.In the second experiment, we fix workload size as W total = 1000 unites and vary the network scenarios where the compute capability of cloud data center is q times more powerful than that of in-transit fog nodes, where q ∈ {5, 10, 15, 20, 25}.Figs. 6 and 7 record the results.
We observe from Figs.5 and 7 that all fog nodes stop computing at the same time for every test.Hence, the proposed algorithm can obtain an optimal task-scheduling strategy that achieves minimum processing time.As expected, we can see from Figs. 4 and 6 that the load fraction assigned to the last node is much larger than that assigned to other fog nodes because the last node represents a cloud data center with high-performance capability, while the other nodes are in-transit network data centers with relatively low-performance capabilities.

Performance Evaluation
To evaluate the effectiveness of the proposed algorithm, we make two comparisons between our algorithm, labeled as "In-transit computation" in the experiment results, and the task-scheduling strategy with only cloud data center performing computation, labeled as "No In-transit computation."Figure 8 records the comparison results obtained for different workloads ranging from 1000 to 10000 units, while Fig. 9 collects the experimental results obtained under different network scenarios with network size varying from 10 to 30.
It can be observed from Figs.8 and 9 that the processing time obtained by "In-transit computation" strategy is much less than that by "No in-transit computation" strategy for each test, and that the time difference between them grows with increasing workload size and network size.As shown in Fig. 8, when workload size is as large as 10000 units in our experiment, the processing time obtained by the "In-transit computation" strategy shows a gain of 65.4% compared to "No In-transit computation" strategy.Also, it can be seen from Fig. 9 that when there are 30 fog nodes in the network system, the "In-transit computation" strategy reduces the processing time by over 55.3% compared to "No Intransit computation" strategy.Therefore, it is clear that our proposed algorithm for in-transit computation can derive an optimal task-scheduling strategy that significantly decreases the processing time of large-scale workloads.This holds even in cases where in-transit fog nodes are not very powerful in computation compared to the cloud data center.

Conclusions
In this paper, we have addressed the task-scheduling problem for in-transit computation of large-scale astronomical workloads under fog computing.We built a novel task-scheduling model and proposed a genetic algorithm to derive an optimal load distribution strategy.We have explicitly considered the astronomical imaging data taken by telescopes of SDSS as our reference size of data volume in our extensive experiments.We demonstrated that the proposed algorithm could significantly decrease the processing time of large-scale workloads by intransit computation.An important and immediate useful extension to the study posed in this paper is considering complex networks with more than one

Fig. 1 .
Now the problem lies in how to take full advantage of in-transit computation by deriving an optimal load distribution strategy among (n + 1) fog nodes, including n in-transit network data centers and the cloud data center, also denoted as f n+1 .

Fig. 1 .
Fig. 1.Timing diagram for in-transit computation find node f max with the longest processing time T max and fog node f min with the shortest processing time T min .Calculate their time difference by ∆ = T max − T min .Step 5 Let β = ∆ / max{z max , z min } .Update individual I by α max = α max − β and α min = α min + β .

Fig. 2 .
Fig. 2. Timing diagram before applying local search Figure 2 shows a timing diagram that corresponds to an individual before applying local search.As illustrated in Fig. 2, node f 1 has the longest processing time T 1 and f 3 has the shortest processing time T 3 .Thus f max = f 1 and f min = f 3 .After load balancing between f 1 and f 3 by local search operator, a possible timing diagram is shown in Fig. 3.It can be observed that the time difference between T 1

Fig. 8 .Fig. 9 .
Fig. 8.Comparison between in-transit computation and no in-transit computation for different workloads Step 6 (Stopping Criteria) If a fixed number of generations reached, then stop and return the best individual I in the current population; otherwise, go to Step 2.
4 (Local Search) Apply local search operator on each individual in set O 1 (t) ∪ O 2 (t).Step 5 (Selection) Select the best E individuals for the next population Pop(t + 1) from set Pop(t) ∪ O 1 (t) ∪ O 2 (t).Select the remaining Popsize − E individuals for Pop(t + 1) by roulette wheel selection also from set Pop(t) ∪ O 1 (t) ∪ O 2 (t).Let t = t + 1.