Explore – exception handling verification
Motivation and scope
From all our Background, everything we know on BPMN and on business analyst needs, we need to pick our scope. So, let's sit down, make ourselves a nice hot cup of tea and reconsider the questions mentioned in the introduction. Ahaa! What if the kettle is malfunctioning? Did you stop and consider that?! (or what if you don’t have kettle since your wife prefers cauldron). And without a kettle how can you make tea? And without tea how can you read the problems you are reading at this very moment…
Seriously, such scenarios are our motivation to look into possible errors in the execution. There is no need for me to shout over rooftops that static analysis is important. We saw it saves huge amount of time and money. Is it the most important aspect in BPMN? Not necessarily. The Prosero prototype did not see it as the top issue. Like Prosero, many feel that BPMN to BPEL transformation is a much bigger issue. I could argue my own love for languages and the importance of a coherent statement. But even without arguing, I can tell you that many have researched the transformation issue and, as we’ve see, have left the static analysis to the level of syntax verification. And so there is much to contribute on static verification – as part of the effort to help Business Process Diagram modelers.
Still, the importance of BPEL does not elude us. We know that BPMN diagrams are not meant just to be hung on walls. They are meant to be executed. And therefore we want to contribute in making the execution as smooth as possible. Meaning, to either avoid exceptions or to handle exceptions. And I will make it clear time and time again; I do not care if the one executing the process is a machine or man. Although, clearly, a machine won’t handle unknown condition as well as a human being, a manager will save time and money if his employees can follow a sequence of instructions without stopping and calling for help in unexpected scenarios.
That is the major responsibility of the business analyst when writing a business process - to make it robust by avoiding or handling unexpected scenarios. This is part of his responsibility to write coherent business processes. We will touch other aspects of coherent diagrams, such as clarity, much later. Currently we are focused on exceptions.
To sum up the focus of our problem – our initial scope is improving modeling tools. There, we concluded that good static analysis is a huge time saver for instructive description – finding a problem in your process, at design time, before you allocate resources for its execution, is very beneficial. Which leaves the question – which aspects of static verification should we improve, to help us model robust business process? The answer is that we want to be able to identify pitfalls in the process, which the business analyst missed, and inform him on the mistake. The most dangerous pitfall is reaching an unexpected condition in execution – causing an exception. In other words, the most dangerous pitfall is not handling an exception.
Static verification
Let's review an example of a problem the Prosero modeling tool does (!!) statically verify – type matching. If a certain Task returns an object of type Happy-Document, the process modeler must model a Happy-Document received from the Task. If the tool would not force the user (at design time) to receive a Happy-Document object, and the user mistakenly uses a different type of object, that could lead to serious problems at run-time. Moreover, knowing the type object makes it easier for the user to handle the object, since the object's attributes are also known.
Like a returned object, an exception is a possible outcome of a Task. For example, you call a Task to get a Happy-Document named "Transformers". And you prepared an object to receive the Happy-Document. But at execution time, the Task returned an Error Event – 'The document was not found'. In the above diagram (4.4) the Process does not "know" what to do with such a response so it simply stops and pass the error "upwards". While in the next diagram, (4.5) the Error is caught and handled.
Here begins our research – can static verification really identify unhandled exception the clever business analyst missed?
Verifying Error Handling in a BPMN Process is mainly:
- For each Activity:
- Identify possible Error Events
- Identify its Handlers
- Match these two lists.
- Error Events without matching Handlers are unhandled exceptions this Process will throw, at run time.
We know that BPMN has an Error Intermediate Event that can catch an error and pass it to handling. So this gives the first problem to consider:
(1) How to force the user to use Error Intermediate Event for error-handling, when it is required?
Which leads to the obvious problem:
(2) How to identify when error handling is required? I.e. identify unhandled Error Event.
And since we don’t believe in force (but do believe in "the force") we also consider the question:
(3) Instead of forcing the user, can the tool add the error-handling for him?
And before we can even begin considering these questions, we must ask ourselves,
(4) Are there many types of errors? Do we handle all of them? Can they be handled?
Types of exceptions
What's in a name? That which we call an Exception
By any other name would fail the execution.
(ahmm… not quite Shakespeare)
First, we ask ourselves what exceptions we should identify. There are two unrelated categories of exceptions to consider: exception type and whether the exception was specified in the process or not.
We divide the exception types into two main classifications: Process Exceptions and Data Exceptions. The main difference is that Process Errors can happen to any Task and any Process and unless declared by the user, cannot be statically verified by the tool.
· Process Exceptions
- Failure
§ Generic infrastructure failure stemming from low levels that cannot be anticipated at BPMN level.
§ Analogous to "Work Item Failure" in [workflowpatterns].
- Timeout
§ An Activity is taking to long to finish, according to system settings.
§ Analogous to "Deadline Expiry" in [workflowpatterns].
· Data Errors
- Superficial data error
§ Can be checked without special processing (e.g., wrong argument type).
§ "This is the wrong form. You need the red form!" the administrator looked angrily through her thick glasses.
- Content level data error
§ Requires processing (e.g., incorrect URL).
§ "This is the right form. But I can't understand a word!" the administrator growled, pointing you to the door.
- Meta-data level error
§ Something wrong with meta-data, e.g., unknown sender, authorization, authentication, etc.
§ "Who are you and how did you pass security?!" the administrator tares her skirt and turns to combat position.
- Commitment/precondition violation
§ Correct data that violates some previous commitment in the process or requires a missing commitment.
§ "I've been waiting for over an hour!" screams the administrator at the delivery guy. "I ordered at 3:21, as you can see in the receipt. I am not paying for this Pizza."
One way to handle a received exception, at run-time, is to try calling the Activity again. There are solutions that do so automatically and even try various implementations of the Task. E.g. [On optimal service selection]. So there is no need to bother the user, at design time, with this type of a solution. This solution is adequate for the Process Errors. In most cases the business analyst won't know how to handle such technical error anyway. And most likely there is not much to be done for such a failure. "Although it is useful to specify, at design time, what should be done if deadline has been reached." [workflowpatterns]. To conclude on Process Error, the user can catch and handle any error he/she likes, but there is no point for the system to go around issuing warnings "this can fail", "that can fail", "they might fail!" Of course any Activity can fail for technical reasons. Regardless if it will be executed by human or a machine Service.
As for the Data Errors, the superficial data errors, such as type matching, are already handled by existing tools.
So, we are left with the real "meat" of the data – the meaning of its content fields and its meta-data attributes. Unfortunately, a truly intelligent solution for this problem will only be considered in "Future Work", since verifying correct content is a serious and challenging Domain-Specific-Knowledge in Natural-Language-Process problem. Until that future, we leave it to the intelligence of the business analyst to specify how to verify the content of the data using BPMN Gateway. And the Gateway may result in a sequence flow that throws a business-level exception – an Error Event. This is a true high-level exception, unlike the low-level exceptions caused by technical failures. And we call it a specified/declared/thrown Intermediate Error Event. It makes sense that an Error Event thrown by the business process, as result of a business decision, will be more crucial than the other types of Error Events.
For example, in figure 6.6, let's say we either have access to the Customer's business process, or that the Customer publish possible Error Events for its business process.
Let us also say that "Accept Payment" Task might throw a 'Payment Denied' Error Event. This information is declared by the Customer, while the fact "Send Invoice" might throw 'Invoice Rejected' is not declared by the Customer. We'll consider 'Payment Denied' as a business level exception and should handle it. And we won't know and won't consider the 'Invoice Rejected' exception.
This concludes the answer to question number 4 (q4) – our research scope includes only specified Error Events. We assume that specified Error Events represent business level exceptions.
Message from the future: we said identifying unspecified Error Events, which represent lower level exceptions, is not in our scope. Nevertheless, there are systems where the thrown Error Events are hidden from us – we don't have access to the information. Identifying specified Error Events in such systems is basically identifying unspecified Error Events; since what use is the specified information if we don't have access to it.
The discussion now moves to the level of handling specified Error Events. Before we can touch how to force the user (q1), or how to help the user (q3), handle an Error Event, we investigate how to identify unhandled specified Error Events at design time (q2). The investigation is in a Top-Down manner.
Identifying unhandled specified Error Events
Let's consider the attributes required to verify Exception Handling.
Add to BPMN Activity two attributes:
· Errors - A list of Intermediate Error Events' Error-names that may be thrown by this Activity.
· Handlers - A list of Intermediate Error Events that are (graphically) attached to this Activity i.e. will catch exceptions.
And add to BPMN Process one attribute:
· Errors - A list of Intermediate Error Events' Error-names that may be thrown by this Process.
Consistency rules:
v Attributes of a Process and a Sub-Process referencing that Process must match. Attributes like Id and Errors.
v Activity's Handlers attribute must match the Error Events attached to the Activity.
v Process Errors must include all declared Errors in the sequence flow and any unhandled Error of any of the Activities.
The verifying procedure now has an easy job – statically verify that any error-name listed in the Activity Errors is addressed. Meaning: it either has a matching handler in the Handlers list, or a matching error in the container Process Error list. Otherwise, the tool will raise a warning notification that will force the user fix this situation. A fix is obviously either to remove an Activity Error, add an Activity Handler or add a Process Error.
This simple verification procedure, and the three attributes it requires to do its job, are the heart of the research and a gateway to a world of problems. Why? Because there are two possibilities to implement the attributes:
· Static attributes vs. dynamic attributes
Each implementation possibility effects the method to populate the attributes:
· Automatically, semi-automatically or manually
Attribute implementation also effects the verification procedure implementation requirements:
· Local vs. remote
· Immediate vs. daily
· Keep consistency vs. allow some inconsistencies
Lastly, there is the matter of system functionally to consider, when reviewing the performance of each requirement:
· files accessibility – fast or slow
· file format – can we add static attributes to legacy data:
o Activity, inside a BPD
o Process and Task, in repository
With combination limitation we have twenty-five use-cases for applying the static verification of unhandled specified Error Events. Let’s dive in.
(1) Dynamic attribute vs. Static attributes
The Error and Handlers lists can either be dynamic – temporarily created and populated by the verifier – or static – Activity meta-data is extended to include these lists. In the case of dynamic attributes, they will be populated by analyzing related Processes or by reading configuration files with information on referenced Activities. And the attributes will be discarded once the static verification is over.
In the case of static attributes, the verifier does not populate the attribute, it just reads them. These attributes are populated by the user or by a separate procedure.
A short pros and cons – although it is understandable that many systems will prefer not to extend BPMN, keeping their element structure as is, it is quite safe to say that extending BPMN, with the static attributes, is better in the long run. Otherwise, a lot of machine power and memory will be wasted on building the dynamic attributes. But until the “long run”, add attributes to Legacy data might be inconsistent – more on that later in the inconsistency section.
A possible compromise, which can still lead to inconsistencies, is to have some attributes static and some dynamic. This is useful if, for example, you want to specify exceptions for a Task in a Process but without adding static attributes to Tasks in your repository.