Master’s Thesis in Computer Science, with a focus on Real-Time Telecom Systems
Recording of Scheduling and Communication Events on Complex Telecom Systems
AUTHORS
Muhammad Imran Mughal
Razwan Javed
SUPERVISOR
Johan Kraft
19th June 2008
School of Innovation, Design and Engineering
Mälardalen University
Note
Due to business secrecy, the companies involved in this thesis may not be revealed. Company A is a consultancy firm, providing services for Company B, a major telecom company.
Abstract
Monitoring and debugging for a real-time system is a complicated problem due to the lack of a set of advanced tools and adequate operating system capability. Software tools can cover the wide range of the software development life cycle from the requirement analysis phase to debugging and maintenance phases. However, the tools available for development of traditional PC software are not sufficient for development of real-time systems. Real-time software tools and effective kernel support are essential to facilitate the development of real-time software as this allows developers of real-time systems to better understand and troubleshoot their systems.
The Industrial Research and Innovation Lab at MRTC (Mälardalen Real-Time Center) contains real telecom system, hardware and software. The software system is based on the real-time operating system OSE. The telecom applications that run on this system are often highly complex and hard to troubleshoot since they are often distributed on different nodes and have strict performance requirements, which make monitoring hard.
Company A and Company B are therefore interested in tools for monitoring the CPU usage and behavior of the system at runtime, for instance the communication and timing of the various processes in the system. At MRTC they are doing closely related research, but they have not yet worked with OSE based systems. This thesis has been formulated to find a suitable technical solution to evaluate the Tracealyzer tool on OSE. We have investigated existing tools and implemented a recorder for the Tracealyzer tool and evaluated it under realistic conditions. The evaluation showed that the CPU usage was quite low, about 1% at 30 % CPU load.
Acknowledgements
This is a master thesis in the field of Computer Science and Electronics. The work present in this thesis was done at Industrial Research and Innovation Lab at MRTC (Mälardalen Real-Time Center) at Mälardalen University in Västerås.
We are very thankful to “Company A” for contributing to this project by allowing us to do measurements on their hardware.
Great thanks go to Ola Liljedahl, Director of Product Enablement and Product Management, ENEA USA for helping us with different investigations about OSE and answering a lot of questions through emails.
We would like to express our gratitude to our supervisor Johan Kraft for his excellent guidance and countless support during this work. He dedicated much time supervising us and pointed us to right direction. His advice has been of vital importance during our work.
We also wish to thank for our lab colleague Mikael Krekola for his companionship and discussions and also for technical support and kind help for this thesis work. We are also thankful to Daniel Flemström, the Lab manager at MDH.
Finally we want to express our gratitude to our parents for their love and support.
Table of Contents
Introduction 1
1.1 Background 1
1.2 Problem Definition 2
1.3 Purpose 3
1.4 Objectives 3
1.5 Method 3
1.6 Thesis Overview 4
Theoretical Background 6
2.1 Company B’s platform 6
2.2 Embedded and Real-Time Systems 6
2.3 Real-Time Operating Systems (RTOS) 7
2.4 The OSE Operating System 8
2.4.1 History of OSE 8
2.4.2 The OSE Family 8
2.4.3 OSE Concepts 9
2.4.3.1 Kernel 9
2.4.3.2 Process Management 9
2.4.3.3 Interprocess Communication (IPC) 11
2.4.3.4 Memory Management 11
2.4.3.5 OSE Kernel Handlers 12
Monitoring and Debugging of Real-Time Systems 14
3.1 Monitoring Systems 14
3.1.1 Terms and Notations 14
3.1.2 Application Domains 16
3.1.3 Probe Effect 17
3.1.4 Monitoring Levels 17
3.1.4.1 Process-Level Monitoring 17
3.1.4.2 Function-Level Monitoring 17
3.1.5 Monitoring Targets 18
3.1.6 Monitoring Approaches 18
3.2 Debugging Systems 19
3.2.1 Static Debugging 19
3.2.2 Dynamic Debugging 20
3.2.2.1 Cyclic Debugging 20
3.2.2.2 Event-Based and Post-Mortem Debugging 20
3.2.2.3 Relative Debugging 20
3.2.3 Debugging with Monitoring Support 21
Related Work 22
4.1 The Tracealyzer 23
4.2 Wind River Workbench 25
4.3 OSE Illuminator 26
4.4 LTT and relayfs 27
4.5 ART Real-Time Monitor 29
4.6 TraceX 31
4.7 Dynamic Probes (DProbes) 32
4.8 Kernel Probes (kProbes) 32
4.9 DTrace 32
4.10 Conclusions 32
Recording of Scheduling and Communication Events in OSE 34
5.1 Technical Requirements of Tracealyzer 34
5.2 How to Get Tracealyzer Requirements in OSE 34
5.3 Methods for Recording of Scheduling Events in OSE 36
5.3.1 The Kernel Handler Implementation Method 36
5.3.2 OSE Kernel Modification 37
5.4 Methods Evaluation 37
5.4.1 Analysis 37
5.4.2 Evaluation and Choice 38
Implementation 40
6.1 Design Issues 40
6.2 Tools and Project Structure Setting 41
6.3 Memory Configuration 42
6.4 Kernel Handlers 43
Investigation of CPU Overhead of Recorder 48
7.1 Execution time of Probe 48
7.2 Measure number of context switches 48
7.3 CPU usage of create_handler_process 48
7.4 CPU overhead of Recorder 48
Conclusions and Future Work 50
8.1 Solution 50
8.2 Limitations 51
8.3 Future Work 51
References 52
Index of Figures
(Figures 1 and 2 removed from public version of report)
Figure 3: Real-time embedded systems
Figure 4: Deadline example in distributed system
Figure 5: Possible process states in OSE
Figure 6: Memory organization in OSE
Figure 7: Tracealyzer tool with a task switch view
Figure 8: CPU Usage graph
Figure 9: Wind River Workbench’s graphical interface and log files view
Figure 10: The Event Viewer displays a trace of all sent signals in the current state.
Figure 11: The CPU Profiler displays the CPU load of an application.
Figure 12: Graphical Viewing Tool
Figure 13: Real-Time Monitor’s Visualizer
Figure 14: Visualized System Behavior
Figure 15: TraceX Visualizer
Figure 16: Kernel handler implementation flow chart
Figure 17: Project Structure
Figure 18: Kernel handler and user area activation
Figure 19: User area configuration
Figure 20: Create handler flow chart
Figure 21: Kill handler flow chart
Figure 22: create_handler_process
Figure 23: Explanation of functions in create_handler_process flow chart
Figure 24: Swap in handler flow chart
51
Chapter 1
Introduction
1.1 Background
The development of embedded real-time systems has grown so much nowadays that they can be found everywhere in our everyday life. There are many industrial areas, in which the application of real-time systems is vitally necessary. Some examples are: telecommunication, computer-aided production in factories, computer-controlled power and atomic plants, vehicular, aerospace industry, etc.
Company A develops parts of Company B’s base software platform for telecom systems. The platform is targeting distributed systems with a network of cooperating nodes, which together make up the system functionality.
The development of real-time systems needs much more time than the development of non real-time systems, because testing and debugging of real-time systems are much more difficult due to several reasons. It is usually not possible to use debuggers in a real-time distributed system. If a single CPU is stopped by the debugger, processes running on other CPUs will probably fail due to a timeout. Monitoring is solution which does not halt the system, but only records the behavior for later presentation. A common problem of monitoring embedded system is performance; the CPU usage of the recorder may be unacceptable. Another aspect of the difficulty in monitoring comes from the invasive nature of the real-time monitoring activity in the distributed environment. It not only interferes with the processor scheduling, but also with communication scheduling and activities [43]. The lack of kernel support makes monitoring/debugging for a distributed real-time system more complicated.
There are two types of errors we often encounter while we monitor or debug a distributed real-time program. One is related to logical error and the other is related to timing error. Both errors are very difficult to track down in a real-time environment. Particularly in a distributed real-time environment, a lack of instantaneous, accurate global state or event ordering creates extra complexity in analyzing program behavior.
Monitoring is often used to measure CPU usage. Two types of CPU usage measurement tools are commonly available: hardware tools, such as logic analyzers, and software tools, such as profilers. The problem with hardware monitors is that they provide too detailed information which makes it hard to get an overview of the behavior and they are very expensive. Software profilers typically collect data in one of two basic ways: instrumentation and sampling. With instrumentation, profilers inject code into key points of the system (such as at the top of every method call); this code then records the event at run-time for subsequent analysis by the profiler. With sampling, the system remains unmodified and is instead analyzed periodically by the profiler in runtime, allowing inspection of such metrics as the amount of CPU time used, for processes and/or functions.
Although they have proven successful for analyzing desktop software and parallel computers, instrumentation and sampling can be enormously problematic for real-time systems. The fundamental flaw is that the very act of observation disturbs the system’s real-time characteristics in a way that may cause missed deadlines and scheduling conflicts that would not occur under normal execution. If using instrumentation, this problem can be avoided by leaving the instrumentation in the release system, so that it becomes a part of the tested system.
This approach is used by the Tracealyzer, a viewer and analysis tool for embedded real-time systems. It presents the system behavior (from a recording) at a higher level of abstraction compared to debuggers, with focus on process scheduling and communication. This is very helpful for understanding the behavior of complex embedded systems. Tracealyzer is also very useful for getting an overview of a system's resource usage, for instance CPU-usage in total and per process. The Tracealyzer uses instrumentation manually inserted in the base software platform. The application software does not have to be modified.
In this chapter the reader will be introduced to the key features of the thesis along with problem definition and methods that will be used to fulfill these. This chapter ends with a thesis overview that briefly describes the content of each chapter.
1.2 Problem Definition
The CPU-usage measurement tools used by Company A are not accurate enough as they use a sampling approach and give different result from time to time. Company A is in need for a tool that could be used to monitor their real-time systems. They especially need the ability to accurately measure the CPU usage for each process and the entire system. There are several existing commercial tools for logging information from systems and showing the result either graphically or textually that should be investigated. The Tracealyzer tool is the primary candidate in them due to availability and experience of the tool on other systems. An important issue is the amount of CPU time that is required for the monitoring, i.e. the overhead.
1.3 Purpose
The main business goal with Tracealyzer, from Company A’s point of view, is the possibility to use Tracealyzer for testing activities. This thesis will investigate the possibility to use of the Tracealyzer tool for visualizing the behavior, timing and CPU usage of software running on the OSE platform. The focus lies on CPU usage and visualization of context switch for processes so the engineers can see if a process consumes more CPU cycles than expected. This can facilitate troubleshooting and give developers a better understanding of the system.
1.4 Objectives
The stated problem has been broken down into the following questions:
- What information is required by the Tracealyzer recorder to measure CPU-usage and display the system behavior?
- Is it possible to record the information needed by the Tracealyzer from OSE to measure CPU-usage, preferably without changing the OSE kernel code?
- What is the impact of this type of recording, in a typical case and in a high load situation?
- What is the execution time of a probe?
- What are the rates of different types of events?
- Is the impact of the recording acceptable, if not, how can it be reduced?
- Is it possible to select what events to record, to reduce the number of probes that are executed?
- Is it possible to reduce the execution time of a probe?
1.5 Method
The thesis project was initially commenced with a study for acquaintance with the Tracealyzer tool, how recordings are visualized by Tracealyzer, technical area of real-time operating system. The OSE real-time operating system from ENEA, which is used by the involved companies, was studied in particular. Discussions on possible methods for recording of scheduling events in OSE were made with Company A and ENEA personnel with great experience in OSE. These discussions lead to a study of the handlers in kernel in OSE.
To measure CPU-usage and display the internal behavior of the system a software recorder was implemented on OSE, compatible with the Tracealyzer tool. Finally the recorder was evaluated on the test cases provided by Company A. The purpose of the recorder evaluation was to see how much it will affect the behavior of the target system and how much CPU will be used by this recorder.
1.6 Thesis Overview
This master thesis report will cover the basic information on operating systems, real time operating systems and specific information about the OSE RTOS, from ENEA. Those basics are necessary for comprehension of the thesis task. The structure of the thesis report is outlined as follows:
Chapter 1: Thesis Introduction, Problem Definition and Method
Chapter 2: Theoretical Background
Chapter 3: Monitoring and Debugging of Real-Time Systems
Chapter 4: Related Work
Chapter 5: Recording of Scheduling and Communication Events in OSE
Chapter 6: Implementation
Chapter 7: Investigation of CPU Overhead of Tracealyzer Recorder
Chapter 8: Conclusions and Future Work
51
Chapter 2
Theoretical Background
2.1 Company B’s platform
(removed from public version of report)
2.2 Embedded and Real-Time Systems
Growing demands on functionality in today's electronic products is leading to an increasing shift towards developing systems in software and programmable hardware in order to increase design flexibility. Most of today's electronic products are based around programmable integrated circuits. These integrated circuits are usually embedded as part of a complete device including hardware and mechanical parts. In contrast to embedded systems a general-purpose computer, such as a personal computer, can do many different tasks depending on programming. A general definition of embedded systems is: