Paper for RSP'99

PARTITIONING FRAMEWORK FOR LESS RESTRICTED PARTITIONING PROBLEMS

Hyunok OhSoonhoi Ha

The Department of Computer Engineering, Seoul National University
Shinlimdong, Kwanakgu, Seoul 151-742, Korea
{oho,sha}@comp.snu.ac.kr

Abstract

Since the hardware-software partitioning problem is a key aspect of the codesign of digital electronic systems, extensive researches have been performed with diverse definitions of partitioning problems. However, existent partitioning solutions are not applicable to many real applications partly because of restricted input specification or insufficient constraints. In this paper, we aim to identify the remaining issues to be solved in order to enlarge the target applications of automatic partitioning. We define the problem space with two axes of input specification and partitioning constraints and position each previous hardware/software partitioning algorithm in that space. We also list some of future partitioning problems that extend both dimensions of the problem space. At last we solve one remaining partitioning problem and show the experimental results.

1Introduction

The main objective of the codesign methodology is to find out the optimal system architecture that performs the desired system function. Therefore, supporting the designer in choosing the right mix of components and interconnection architectures is essential to the codesign methodology. We define the hardware-software partitioning problem in a broad sense as follows: take as input a function specification and produce as output an architecture and an assignment of functions to components under specific design constraints. The surveys of partitioning algorithms can be found in [1]-[3].

The partitioning problem includes three sub-problems while the second sub-problem has been recognized as the partitioning problem in a narrow sense. By no means these subproblems are independent of each other and tackled separately.

Component selection : determine the processing elements such as processors, hardware modules, and the interconnection architecture. Choosing an implementation of a hardware module is also included in this sub-problem.

Mapping : determine on which component each task is mapped.

Scheduling : determine the order of task executions on each component. The execution order is usually determined statically and fixed at run-time in a processor. Resource sharing problem of hardware modules belongs to this sub-problem.

Despite relatively long history of active researches, existent partitioning solutions are still not applicable to many real applications partly because of restricted input specifications or constraints as well as the large time complexity of partitioning algorithms themselves.

2Partitioning Algorithms : Past and Present

The early partitioning algorithms[4][5] have a rather restricted problem space and a trivial solution space. For example, the input task graph is an acyclic task graph with a small number of nodes whose execution speed and cost are assumed to be fixed. The system architecture is also fixed from the beginning.

Recent partitioning algorithms broaden the solution space and tackle the architecture selection problem with the narrow-sense partitioning problem at the same time. Some researchers use the term “cosynthesis” to mean the partitioning problems in this paper. However, their solution spaces can be further classified, depending on the target architecture in mind.

There are two classes of target architectures: one is a system-on-chip(SOC) and the other is a distributed heterogeneous embedded(DHE) system. The solution space for SOC design generally consists of a single processor core and a good deal of hardware implementation possibilities for each function block[6]-[17]. Therefore, a significant part of the component selection sub-problem is devoted to find out the optimal implementation of each hardware function block under design constraints such as execution speed, cost, and power consumption. If there are multiple processor cores available, processor selection is also included in the component selection sub-problem.

On the other hand, the DHE design concerns with a well-defined function module as an atomic block of the input task graph[18]-[31]. The hardware modules will be mapped to a predefined hardware library among a few candidates at best. Therefore, the solution space usually consists of many programmable processors and customized hardware component libraries. A customized hardware module is regarded as a customized processing element (PE), thus is not distinguished from programmable processors.

Since the partitioning problems for the DHE design do not consider the implementation selection of hardware modules, the solution space is usually smaller. Instead, they aim to synthesize the high level architecture synthesis such as communication interface synthesis and communication protocol selection. Since our taxonomy does not distinguish the detailed target architecture, we summarize the range of solution space as follows: (1) fixed architecture[4],[14]-[17] and (2) synthesizable architecture. Architecture synthesis consists of the following three sub-problems: (1) component selection[6]-[13],[21]-[24], (2) interface synthesis[21]-[24], (3) memory structure synthesis[28].

3Partitioning Problems Beyond Existent Algorithms

Real applications like multimedia applications have a large size of input tasks and constraints. An example is the H.263 encoding algorithm whose top-level task graph is displayed in figure 1. Distributor block in the task graph has data parallelism inside. Data parallel tasks can be executed sequentially, concurrently, or in a hybrid fashion. And, the graph has a feedback arc to make it a cyclic graph. If we apply graph transformation techniques such as retiming and pipelining to the cyclic graph, better design could be achieved. Since existent partitioning algorithms do not consider hybrid exploitation of data parallel blocks and graph transformation, the H.263 encoding algorithm is still outside the domain of current partitioning problem.

Figure 1. A task graph of H.263 encoding algorithm

Another example is found in networked applications. Some tasks have dynamic behaviors such as dynamic execution times, dynamic control paths, and dynamic loop counts even though previous researches assume static behavior of tasks.

Most previous partitioning approaches for the DHE design were based on well-formulated techniques such as integer linear programming, simulated annealing and genetic algorithm. While they are extensible to consider new design constraints or objectives, they are usually too time-consuming for large problem size. On the other hand, several heuristics have been developed for SOC to find a solution within reasonable CPU time. However, since they were developed for specific partitioning problems, they are hardly extensible to consider new constraints.

4Proposed Partitioning Framework

We introduce a fast and flexible partitioning algorithm[13] that covers from implementation selection, resource sharing to architecture selection. The proposed algorithm is not a simple partitioning algorithm but a partitioning framework that will be easily extended to consider new constraints.

As an example, we consider a new partitioning problem with two-sided timing constraints. We solved this problem by extending our partitioning framework quite easily. In this problem, a task may be assigned two-sided timing constraints: a release-time and a deadline. The execution of a node cannot begin until its release time and must complete by its deadline. This problem can be commonly found in embedded real-time applications.

The multi-function problem[9] can be reduced to a problem with two-sided timing constraints. In figure 2, there are two task graphs, A-B-C and D-E-F, that will be executed in the time-shared fashion. They have separate deadline constraints, t1 and t2, respectively. Figure 2 shows that these two task graphs can be merged to a single task graph with two-sided timing constraints. We set the release-time of node D to t1 and the deadline of node F to t1 + t2. Then, the partitioning problem with the merged graph is equivalent to the multifucntion problem, because two merged graphs will be scheduled separately within different time intervals.

We assume that the goal of the partitioning problem is to minimize the system cost while meeting all timing constraints.

Figure 2. A multi-function problem can be reduced to a problem with two-sided timing constraints.

4.1Partitioning Framework

In order to solve the problem with two-sided timing constraints, we extend the existent partitioning framework[13]. The partition framework consists of two main parts as shown in figure 3: a heterogeneous multiprocessor(HMP) scheduler and the task-PE(processing element) allocation controller. The inputs to the partitioning framework are input task graphs and a task-PE profile table that contains information such as the execution time and cost of a task on each processing element.

The scheduler schedules the task graphs to the processing elements in the shortest execution time based on the task-PE time table. On the other hand, the task-PE allocation controller modifies the task-PE time table at each iteration according to the design objectives. Since we want to minimize the cost of a design under time constraints, the task-PE allocation controller modifies the task-PE time table so that a task will not be scheduled on processing elements with high cost. The detailed discussion can be found elsewhere [13].

Figure 3. The Partitioning Framework

4.2Extension of Heterogeneous Multiprocessor Scheduling Algorithm

The problem with two-sided timing constraints could be handled by extending the BIL scheduler[32] that is adopted as the HMP scheduler in our partitioning framework.

The BIL scheduling algorithm is a list scheduling technique where a task is assigned a priority based on the best imaginary level (BIL). The BIL of a task is defined as follows.

BIL(i,j)=E(i,j)+ max[ min(BIL(d,j),min(BIL(d,k)+C(i,d)))]

(where i is i-th task, j is j-th processor, d is a child task of task i. E(i,j) is the execution time of task ion processor j and C(i,d) indicates the amount of IPC overhead between iand d.)

The BIL of taski on jrepresents the critical path length including the IPC overhead from task i to the end of the task graph based on the assumption that the task is scheduled on processor j and all descendant tasks are also scheduled on the processor with minimum BIL value.

In order to consider the deadline of each task, we revised BIL to BIL* which has a larger value if its deadline is earlier. The BIL is computed using child’s BIL* values of child nodes instead of BIL values.

BIL*(i,j) = max(BIL(i,j), deadline* – deadline(i) + E (i,j))

(where deadline* is the longest deadline in the given task graph.)

At each scheduling step, the scheduler compares the priorities of the runnable tasks. The priority of a runnable task is adjusted considering the processor available time. The adjusted level of a taski on processor j, BIM (Best Imaginary Makespan), is defined as follows:BIM(i,j) = T(i,j) + BIL(i,j), since the node i cannot be scheduled on processor j before T(i,j). Note that a runnable node has N different BIM values, one for each processor, if the total number of processors is N.

To support a release time constraint, T(i,j) is revised to T*(i,j) = max(T(i,j), release time(i)). BIM(i,j) is also redefined with T*(i,j) instead of T(i,j). T*(i,j) makes sure that the task is scheduled after its release time.

5Experiments

We compared our partitioning heuristic with an integer linear programming(ILP) solution for problems with two-sided constraints. Table 1 shows the performances for Prakash and Parker’s second task graph[22] and Hou’s task graphs[21]. “P&P(8)” means that Prakash and Parker’s task graph with the worst case finish time of eight time units. We assume that S1’s release time is 2 in P&P’s example and the release times of d1, a2, and i3 are 50, 10, and 30 in Hou’s examples. When they are applied to 1&2 and 3&4 versions of Hou’s task graphs, ILP cannot find an appropriate solution within 2 days since the problem sizes are too large to explore.

Example / No. of Tasks / ILP / Proposed algorithm
Price / CPU Time(s) / Price / CPU Time(s)
P&P(8) / 9 / 6 / 2.54 / 9 / 0.01
P&P(15) / 9 / 5 / 0.15 / 5 / 0.01
Hou1&2 / 20 / - / 170 / 0.13
Hou3&4 / 20 / - / 170 / 0.07

Table 1. ILP vs. Proposed Algorithm

6Conclusion

In this paper, we surveyed the previous partitioning researches to identify the domain of applications that they have tackled so far. In spite of extensive efforts and distinct achievements, we observed that existent partitioning algorithms still could not be applied to many real applications.

More extensive partitioning researches are expected to enlarge the application domain of automatic partitioning. Our proposed partitioning framework is shown to be a promising compromise between time-consuming approaches and hard-to-extend heuristics.

References

[1]W.H. Wolf, “Hardware-Software Codeisgn of Embedded systems,” Proceedings of the IEEE, vol. 82, pp. 967-989, July 1994.

[2]S. Edwards, L. Lavagno, E. A. Lee, and A. Sangiovanni-Vincentelli, “Design of Embedded Systems: Formal Models, Validation, and Synthesis,” Proceedings of the IEEE, vol. 85, no. 3, march 1997.

[3]S. Parameswaran, “HW-SW Co-Synthesis: The Present and The Future”, in ASPDAC, 1998.

[4]R. K. Gupta and G. De Micheli, “Hardware-Software Cosynthesis for Digital Systems”, IEEE Des. & Test Comput., vol. 10, no. 3, pp. 29-41, Sept. 1993.

[5]R. Ernst, J.Henkel, and Th. Benner, “Hardware-Software Cosynthesis for Micro-Controllers”, IEEE Des. & Test of Comput., vol. 10, no.4, pp. 64-75, Dec. 1993.

<System-On-Chip>

[6]J. Adams, and D. Thomas, “Multiple-Process Behavioral Synthesis for Mixed Hardware-Software Systems”, in ISSS, 1995.

[7]V. Srinivasan, S. Radhakrishnan, and R. Vemuri, “Hardware Software Partitioning with Integrated Hardware Design Space Exploration”, in DATE, 1998.

[8]A. Kalavade and E.A. Lee, "The Extended Partitioning Problem: Hardware/Software Mapping and Implementation-Bin Selection",in RSP, 1995.

[9]A. Kalavade, and P. Subrahmanyam, “Hardware/Software Partitioning for Multi-function Systems”, in ICCAD, 1997.

[10]E. Barros, W. Rosenstiel, and X. Xiong, “A Method for Partitioning UNITY Language in Hardware and Software”, in EuroDAC, 1994.

[11]P. Eles, Z. Peng, K. Kuchcinski, and A. Doboli, “Hardware/Software Partitioning with Iterative Improvement Heuristics”, in ISSS, 1996.

[12]I. Karkowski, and R.H.J.M. Otten, “An Automatic Hardware-Software Partitioner Based on the Possibilistic Programming”, in EDTC, 1996.

[13]H. Oh and S. Ha, “A Hardware-Software Cosynthesis Technique Based on Heterogeneous Multiprocessor Scheduling”, in Proc. Int. Workshop Hardware-Software Codesign(CODES) , 1999.

[14]J. Jeon, and K. Choi, “Loop Pipelining in Hardware-Software Partitioning”, in ASPDAC, 1998.

[15]N. Binh, M. Imai, A. Shiomi, and N. Hikichi, “A Hardware/Software Partitioning Algorithm for Pipelined Instruction Set Processor”, in EuroDAC, 1995.

[16]N. Binh, M. Imai, and A. Shiomi, “A New HW/SW Partitioning Algorithm for Synthesizing the Highest Performance Pipelined ASIPs with Multiple Identical FUs”, in EuroDAC, 1996.

[17]N. Binh, M. Imai, A. Shiomi, and N. Hikichi, “A Hardware/Software Partitioning Algorithm for Designing Pipelined ASIPs with Least Gate Counts”, in DAC, 1996.

[18]F. Vahid, “Procedure Cloning: A Transformation for Improved System-Level Functional Partitioning”, in EDTC, 1997.

[19]T.-Y. Yen, and W. Wolf, “Sensitivity-Driven Co-Sysnthesis of Distributed Embedded Systems”, in ISSS, 1995.

[20]S. Srinivasan, and N. Jha, “Hardware-Software Co-Synthesis of Fault-Tolerant Real-Time Distributed Embedded Systems”, in EuroDAC, 1995.

[21]J. Hou and W. Wolf, “Process Partitioning for Distributed Embedded Systems,”, Int. Workshop Hardware-Software Codesign(CODES), 1996.

[22]S. Prakash and A. Parker, “SOS: Synthesis of Application-Specific Heterogeneous Multiprocessor Systems.”Journal of Parallel Distributed Computing vol. 16, pp. 338-351, Dec. 1992.

[23]R. Dick, and N. Jha, “MOGAC: A Multiobjective Genetic Algorithm for the Co-Synthesis of Hardware-Software Embedded Systems”, in ICCAD, 1997.

[24]B. Dave, G. Lakshminarayana, and N. Jha, “COSYN: Hardware-Software Co-Synthesis of Embedded Systems”, in DAC, 1997.

[25]D. Kirovski, and M. Potkonjak, “System-Level Synthesis of Low-Power Hard Real-Time Systems”, in DAC, 1997.

[26]P. Chou, R. Ortega, and G. Borriello, “The Chinook Hardware/Software Co-Synthesis System”, in ISSS, 1995.

[27] U. Holtmann, and T. Benner, “Adaptation of Partitioning and High-Level Synthesis in Hardware/Software Co-Synthesis”, in ICCAD, 1994.

[28]A. Jantsch, P. Ellervee, J. Oberg, A. Hemani, and Tenhunen, “Hardware/Software Partitioning and Minimizing Memory Interface Traffic”, in EuroDAC, 1994.

[29]A. Bender, “MILP Based Task Mapping for Heterogeneous Multiprocessor Systems”, in EuroDAC, 1996.

[30]M. Schwiegershausen, H. Kropp, and P. Pirsch, “A System Level HW/SW Partitioning and Optimization Tool”, in EuroDAC, 1996.

[31]G. Marchioro, J. Daveau, and A. Jerraya, “Transformational Partitioning for Co-Design of Multiprocessor Systems”, in ICCAD, 1997.

<Other>

[32]H. Oh and S. Ha, “A Static Scheduling Heuristic for Heterogeneous Processors,” in EuroPar, 1996.