Ph.D. Student Seminar Ph.D. Status Report

Ph.D. Student Seminar Ph.D. Status Report

November 19, 2003

Software Engineering group, IDI, NTNU

Ph.D. Status Report

Name Surname

Torgrim Lauritsen

Supervisors

Tor Stålhane, Reidar Conradi and Torbjørn Skramstad

Year of Ph.D.

First year

Title of Ph.D.

‘A method for developing safe business-critical software’

Description of Ph.D. work

Motivation/background

Many software projects which develop business-critical systems have problems with deliver good code at the right time. Good, means faultless and hazard free code. The development will benefit from using work processes that have been used with success in other industry projects in many years.

Most methods and techniques that can handle such problems efficiently are found within traditional systems safety analysis. These methods have, however, largely been ignored in the general software community both in Norway and most other countries.

Research theme/focus

I want to focus on several of the methods mentioned in traditional systems safety analysis together with UML and RUP, and try to make a small, but effective method that the IT industry in Norway may use. This method will contribute to process improvement for developing safe business-critical software.

Research design

I will make a survey with questions, sent out in a larger scale, approximately 100 businesses. This survey will involve questions concerning which parts of the RUP the business is working with, and why these parts and not the other parts of RUP. Another question will be how they are dealing with communication with the customer, their test procedures and how they are going to reach the goal; to deliver the product according to the contractual obligations.

When I get the results, I will come up with a method that I want some of the asked businesses to try out in a small project, and see how it works.

Preliminary results

I’ve done a preliminary study this fall (2003) of problems in Norwegian IT industry. This has been in cooperation with my co-workers in the BUCS project.

Preliminary conclusion

All the people we interviewed in our preliminary study talked about their problems with delivering the software product to the contractual time (time to marked) and that they are having trouble communicating with their customers.

Most of the businesses we interviewed mentioned that they used or are going to use parts of RUP and UML. A new method for developing safe business-critical software must fit into RUP and UML framework.

Open issues

I will look further on the methods that I have been studying this fall and also look into RUP and UML in more details.

Status

In this section you record your current status and projected plan.

Papers matrix

Title of the paper / Publication / Future

Credit plan

Course title / Term
TDT 4235: Quality assurance and Software process improvement / Fall 2003
DT 8108: Topics in information technology / Fall 2003 /
Spring 2004
DIXIL-01: ”Utvikling av sikkerhetskritiske systemer” / Fall 2003
DT 8111: Empirical software development / Fall 2004
DT 8100: Objectoriented software development / Spring 2005

Credit status

Activity last 6 months

I started 1’st of July and I have become familiar with my co-workers and my supervisor. I have studied Cause-consequence analysis (CCA), Event Tree Analysis (ETA), Fault Hazard Analysis (FHA), Failure Modes and Effects Analysis (FMEA), Hazards and operability analysis (HAZOP) in addition to Management oversight and risk tree analysis (MORT).

We have discussed things around business-critical software development, especially when we prepared the preliminary study of problems the Norwegian IT industry. We talked to eight companies and interviewed people there about problems they are facing when they develop business-critical software.

I have read two books:

- Safeware – System Safety and Computers by Nancy G. Leveson

- System Safety: HAZOP and Software HAZOP by Redmill, Chudleigh and Catmur

I have joined 2 classes:

DIF 8916 – Topics in Information Technology where we have had some lessons in how to write articles and have joined some guest lectures, and

TDT4235 - Quality Assurance and Software Process Improvement where we learns about standards in IT industry and how to improve the quality on SW generally. This course introduces methods and standards I will look further into in the BUCS project:

Standards like IEEE 730, The International Electrotechnical Commission (IEC) standard 61508, ISO 9001 and ISO 9000-3, which will be useful in quality assurance work.

Methods for software process improvement like: Total Quality Management (TQM) GAP analyze, Goal Question Metrics (GQM), Rapid Process Improvement (RPI), ISO 9126 (Factor Criteria Metrics (FCM)), Capability Maturity Model (CMM), Experience Factory (EF), Delphi method, PostMortem Analyze (PMA) and Dr. Deming’s circle.

I have joined several seminars here at NTNU

· “Science in public” 18/09-03 by British professor Steve Miller, University College, London

· “The ATAM method” 16/09-03 by T-E Fægri and G. K. Hansen from Sintef,

· “Safety and patterns seminar” 10/10 by Kai Hansen from ABB,

· “Extend - a simulation tool” 21/10-03 held by Dr. ing. Siegfried Eisinger from Det Norske Veritas.

The three last ones are joint BUCS / Websys seminars.

I have been on a seminar where Nancy G. Leveson lectured on “Software System Safety” in Stockholm 20 – 21 of august 2003. I have written a summary with my study partner from this seminar.

Activity next 6 months

I will take the Empirical software development course next semester.

Through what I have learned I have come to the conclusion that the field of developing business-critical software is very complex.

Often projects are underestimated and delivered later than they were stated in the requirement specification agreement. In addition the software which is produced is not fault free either. The question is why and what can traditional systems safety analysis methods do for prevent this! This is what I am going to look further on in my study.

In the material I have studied so far the most promising idea seems to be to assign people trained in software safety, and assign them the task to monitor the hazards and the fault situations that may arise, early in development project.

When the requirement specification agreement is finished, and before the software developing starts, the safety team must inspect it, and see if they could find any hazards or fault situations that may arise during the development time. This may be both architecture, restrictions and limitations that are in the environment. Be aware that the environment is always shifting, either internal or external. The safety team should search for hazards or fault situations that may arise during the development time of the new (or maintenance of existing) software.

In this way the developers are made aware of the hazards and fault situations that may arise, and they can isolate them, by building barriers around them. They thus have control during the development time and after the program is put into use.

This will be expensive, but the software development business earns it back, because they can deliver hazard and fault free programs! They must be helped to see the cost and benefit and advantages together.

It is easier and cheaper to develop barriers during the development time, than to add them later, when the software product is in use at the customer’s site. This will be an advantageous offer to your customers today and your future customers tomorrow! The positive rumour spreads around in the market, and will make the software development businesses more profitable.

Unfortunately, it is not always possible to enforce the conclusion that the software should not be able to reach hazardous states – sometimes a hazardous state is unavoidable. By knowing about this possibility, the software developers can take steps to minimize the risk associated with the hazard, such as minimizing the exposure or adding system safe-guards to protect the system against such states. Hazards can also arise from deviation of the designer’s intention.

The safety team have specific tasks:

- Develop test plans

- Perform hazard analyses

- Be aware of safety standards

- Run through checklists which may be based on historical data

In the beginning of a new project, you start talking to the customer and together you write a requirement specification agreement. You have to be aware of communication problems that may arise. This is because the software developers’ focus on the technology, they see it as a challenge, and do not focus on the system they are going to develop. The software developers do not understand what problems they are causing, when they are building software that doesn’t work for their customers.

Another problem is that the software developers do not have enough background information about the special field they are going to make a software tool for. Every special domain has it’s own technical terms.

How can we improve the communication between the customers and the software developers, so that we reduce the insecurity about what we are going to make? To avoid costly redesign and recoding, the requirements specification and analysis should be as complete as possible as early as possible. Realistically, however, some of the analysis will be put off or redone as the software and system development proceeds.

In a dynamic world the business processes will always change. Many projects requirements are not complete before software development begins. It is therefore important that we can develop new systems (and maintain old systems) step by step.

In addition, changes are often made as the design of the other parts of the system becomes more detailed and problems are found that necessitate changes in the desired software behaviour. It is therefore unlikely that the analysis will be completed before software design begins.

The requests change, new wishes arise, and others will be left out. When the requests change, you need a new hazard analysis before the developers can continue with their work. Some of the changes may cause new hazards. You have to let the safety group work along with the customer.

I will start by describing well known methods in safety, Cause-consequence analysis (CCA), Event Tree Analysis (ETA), Fault Hazard Analysis (FHA), Failure Modes and Effects Analysis (FMEA), Hazards and operability analysis (HAZOP), Management oversight and risk tree analysis (MORT) and Preliminary Hazard Analysis (PHA).

After that, I will develop a small, but effective method from these methods, and try it out in a pilot project possibly at St. Olavs Hospital in Trondheim in cooperation with Helse Midt-Norge.

It is important that the new method is small and effective, so that the software developers have the time to use it. It must also be easy to learn and to use, and fit well with current methods they are using in the IT industry today, like RUP and UML.

Because of the business processes will always change, maybe eXtreme Programming (XP) is the solution? On many projects requirements are not complete before software development begins. It is therefore important that we can develop new systems (and maintain systems) gradually (step by step) because you then gain chance of adjusting along the way. XP is future-oriented / forward-looking by that very fact the method can be used to develop applications when you don’t know they are going to work at completion time. Along the development time, you show the customer how the product you have developed, and you can include the feedback you get from the customer.

SW engineers have a great deal of knowledge to contribute to system-safety efforts. No nontrivial software exists that will function as desired under all conditions, no matter what changes occur in the other components of the system. Therefore, all system designs need to consider the consequences of SW errors and build protection against them. The design must eliminate or control the specific hazards identified for a particular system.

Risk is a function of

(1) the likelihood of a hazard occurring,

(2) the likelihood of the hazard leading to an accident (including duration and exposure), and

(3) the severity or consequences of the accident.

A design can be made safer by reducing any or all of these three factors. System safety guidelines suggest that risk reduction procedures can be applied:

Hazard elimination:

- Substitution

- Simplification

- Decoupling

- Elimination of specific human errors

- Reduction of hazardous materials or conditions

Hazard reduction:

- Design for controllability

- Barriers

- Lockouts

- Lockins

- Interlocks

- Failure minimization

- Safety factors and safety margins

- Redundancy

Hazard control:

- Reducing exposure

- Isolation and containment

- Protection systems and fail-safe design

GUI:

Building a reliable and error-free complex interface using software is difficult, but it seems like this lesson have not yet been learned by many engineers. Reducing and simplifying interfaces will reduce risk! E.g. does the model view controller (MVC) technique help in developing risk free interfaces? Interface problems often lie in the control systems; thus a basic design principle is that control systems not be split into pieces!

Test:

How do the software developers test their software? Do they make their own tests, or do they have a test department, which takes care of all testing? Do the IT businesses inspect their own work, adifges how? Do they use checklists? And can I make a method that fits to their existing inspection schemes, or should I develop a new method?

Psychology:

Psychology is an important aspect looking at improving methods in developing business-critical software. During my time as software system developer I experienced many situations where psychology aspects had affect on the project I was involved in. The psychology aspect in software development is strongly de-emphasized!

Communication problems:

And as I have mentioned before, when I talked about communication problems with the customer and how software developers focus very much on the technology, and does not emphasize on the system requirements they are going to develop.

Psychology factors inwards:

We also look at psychology factors in the software development group. How do they react when they are going to analyze their own situation? It is important that I introduce a new method by showing them that it is for their own good, and let them be a part of the analyze team after they have tried out the new method.

Do software developers understand what problems and frustrations they are causing when they are making software?

Is it a problem that the software developers do not understand what problems they are causing, when they are building software that doesn’t work good enough for their customers?

The environment is always shifting:

Are the software developers aware of that the environment is always shifting, both internal and external. How do they deal with this?