Software reverse engineering:

•Chikofsky & Cross: two-phase process

Collecting information

•parsers, debuggers, profilers, event recorders

Abstracting information

•Making understandable, high-level models

“Programmers have become part historian, part detective, and part clairvoyant” (T.A.Corbi 1989)

•Inputs to Reverse Engineering Programming code

Database structure

Data

Forms and reports

Documentation

Application understanding

Test cases

Reverse Engineering

•Outputs from Reverse Engineering

Models

Mappings

Logs

Building the Class Model

•Building the class model using three phases

Implementation Recovery

Design Recovery

Analysis Recovery

Building the Class Model

•Implementation Recovery oLearn about the application and create an initial class model oStudy the data structures and operations and manually determine classes oHave an initial model focused on the implementation

Design Recovery

The multiplicity in reverse direction is typically not declared and it must be determined from general knowledge or examination of the code

Many implementations use a collection of pointers to implement an association with “many multiplicity”

Pointers will implement both association directions

Building the Class Model

Design Recovery

The multiplicity in reverse direction is typically not declared and it must be determined from general knowledge or examination of the code

Many implementations use a collection of pointers to implement an association with “many multiplicity”

Pointers will implement both association directions

Analysis Recovery

If the source code is not object oriented, infer generalizations by recognizing similarities and differences in structure and behavior

Building the Interaction Model

•The purpose of each method is clear but the way that objects interact to carry out the purposes of the system is hard to understand from the code

•A slice is a subset of a program that preserves specified projection of its behavior

•The accumulated code lets the project an excerpt of behavior from the original program

•The purpose of each method is clear but the way that objects interact to carry out the purposes of the system is hard to understand from the code

•A slice is a subset of a program that preserves specified projection of its behavior

•The accumulated code lets the project an excerpt of behavior from the original program

•The power of slice comes from They can be found automatically

Slices are generally smaller than the programs from which they originated

They execute independently of one another

Each reproduces exactly a projection of the original program’s behavior

Building the State Model

•To construct a state model

Fold the various sequence diagrams for a class together by sequencing events and adding conditions and loops

Augment the information in the sequence diagrams by studying the code and doing dynamic testing

Initiation and termination correspond to construction and destruction of objects

Reverse Engineering Tips

•Distinguish supposition from facts

•Use a flexible process

•Expect multiple interpretations

•Don’t be discouraged by approximate results

•Expect odd constructs

•Watch for a consistent style Wrapping

•A wrapper is a collection of interfaces that control access to a system

•Consists of a set of boundary classes that provide the interfaces

•Provides a clean interface for exposing the core functionality of existing applications

•New functionality can be added as a separate package

•Example

Web Application

•Wrapping is temporary solution

Constrained by the organization of the legacy systems

Original old code, the wrapper and code in a different format become so unwieldy and it must be rewritten

•The use of XML as a gateway for communication between the legacy systems and the modern world

–better form of representation

–not always possible

–result depends on the parser (notable differencies)

•Binaries

–faster information collection (e.g. Java byte code)

–legality issues

Usage of binaries

(reverse engineering, decompilation, disassembly)

•Recovery of lost source code

•Migration of applications to a new hardware platform

•Translation of code written in obsolete languages not supported by compiler tools nowadays

•Determination of the existence of viruses or malicious code in the program

•Recovery of someone else's source code (to determine an algorithm for example)

Binary copyrights

(decompilation, disassembly)

•Not all countries implement the same laws !

•Commonly allowed by law

for the purposes of interoperability

for the purposes of error correction where the owner of the copyright is not available to make the correction

to determine parts of the program that are not protected by copyright (e.g. algorithms), without breach of other forms of protection (e.g. patents or trade secrets)

•The decompilation page: home.html Copyrights cont.

•EU: 1991 EC Copyright Directive on Legal Protection of Computer Programs provided extensions to copyright to permit decompilation in limited circumstances

•An example: Sony sued Connectix Corp (1999) for developing of its Virtual Game Station emulator, and emulator of the Sony developed PlayStation (Mac) -> a long fight over emulation rights and extent of copyright protection on computer programs

Reverse Engineering Tools

•Analysis Tools

•Browsers

•Object Server

Task Oriented Tools

Example--Java Decompiler

•How to recover bytecode from .class file under Unix/Win with JDK?

% javap -c <filename>

% javap -help (to see the options)

•Java Decompilers

”ClassCracker”

“DeCafePro" from DeCafe, France at

“SourceAgain" from Ahpahcorp at

Example--Java Decompiler

•ClassCracker 2 Interface

Components of ClassCracker 2

Java decompiler

•retrieves Java source code from Java class files

Java disassembler

•produces Java Assembly Code

AJava class file viewer

•displays Java class file structures.

Features of ClassCracker 2

User visual interface.

Can decompile class files within zip or jar files.

Conversion mode (JAVA, JASM or JDUMP) is selectable

ABatc Mode allows multiple class files to be decompiled simultaneously