Software reverse engineering:
•Chikofsky & Cross: two-phase process
Collecting information
•parsers, debuggers, profilers, event recorders
Abstracting information
•Making understandable, high-level models
•“Programmers have become part historian, part detective, and part clairvoyant” (T.A.Corbi 1989)
•Inputs to Reverse Engineering Programming code
Database structure
Data
Forms and reports
Documentation
Application understanding
Test cases
Reverse Engineering
•Outputs from Reverse Engineering
Models
Mappings
Logs
Building the Class Model
•Building the class model using three phases
Implementation Recovery
Design Recovery
Analysis Recovery
Building the Class Model
•Implementation Recovery oLearn about the application and create an initial class model oStudy the data structures and operations and manually determine classes oHave an initial model focused on the implementation
•Design Recovery
The multiplicity in reverse direction is typically not declared and it must be determined from general knowledge or examination of the code
Many implementations use a collection of pointers to implement an association with “many multiplicity”
Pointers will implement both association directions
Building the Class Model
•Design Recovery
The multiplicity in reverse direction is typically not declared and it must be determined from general knowledge or examination of the code
Many implementations use a collection of pointers to implement an association with “many multiplicity”
Pointers will implement both association directions
•Analysis Recovery
If the source code is not object oriented, infer generalizations by recognizing similarities and differences in structure and behavior
Building the Interaction Model
•The purpose of each method is clear but the way that objects interact to carry out the purposes of the system is hard to understand from the code
•A slice is a subset of a program that preserves specified projection of its behavior
•The accumulated code lets the project an excerpt of behavior from the original program
•The purpose of each method is clear but the way that objects interact to carry out the purposes of the system is hard to understand from the code
•A slice is a subset of a program that preserves specified projection of its behavior
•The accumulated code lets the project an excerpt of behavior from the original program
•The power of slice comes from They can be found automatically
Slices are generally smaller than the programs from which they originated
They execute independently of one another
Each reproduces exactly a projection of the original program’s behavior
Building the State Model
•To construct a state model
Fold the various sequence diagrams for a class together by sequencing events and adding conditions and loops
Augment the information in the sequence diagrams by studying the code and doing dynamic testing
Initiation and termination correspond to construction and destruction of objects
Reverse Engineering Tips
•Distinguish supposition from facts
•Use a flexible process
•Expect multiple interpretations
•Don’t be discouraged by approximate results
•Expect odd constructs
•Watch for a consistent style Wrapping
•A wrapper is a collection of interfaces that control access to a system
•Consists of a set of boundary classes that provide the interfaces
•Provides a clean interface for exposing the core functionality of existing applications
•New functionality can be added as a separate package
•Example
Web Application
•Wrapping is temporary solution
Constrained by the organization of the legacy systems
Original old code, the wrapper and code in a different format become so unwieldy and it must be rewritten
•The use of XML as a gateway for communication between the legacy systems and the modern world
–better form of representation
–not always possible
–result depends on the parser (notable differencies)
•Binaries
–faster information collection (e.g. Java byte code)
–legality issues
Usage of binaries
(reverse engineering, decompilation, disassembly)
•Recovery of lost source code
•Migration of applications to a new hardware platform
•Translation of code written in obsolete languages not supported by compiler tools nowadays
•Determination of the existence of viruses or malicious code in the program
•Recovery of someone else's source code (to determine an algorithm for example)
Binary copyrights
(decompilation, disassembly)
•Not all countries implement the same laws !
•Commonly allowed by law
for the purposes of interoperability
for the purposes of error correction where the owner of the copyright is not available to make the correction
to determine parts of the program that are not protected by copyright (e.g. algorithms), without breach of other forms of protection (e.g. patents or trade secrets)
•The decompilation page: home.html Copyrights cont.
•EU: 1991 EC Copyright Directive on Legal Protection of Computer Programs provided extensions to copyright to permit decompilation in limited circumstances
•An example: Sony sued Connectix Corp (1999) for developing of its Virtual Game Station emulator, and emulator of the Sony developed PlayStation (Mac) -> a long fight over emulation rights and extent of copyright protection on computer programs
Reverse Engineering Tools
•Analysis Tools
•Browsers
•Object Server
•Task Oriented Tools
Example--Java Decompiler
•How to recover bytecode from .class file under Unix/Win with JDK?
% javap -c <filename>
% javap -help (to see the options)
•Java Decompilers
”ClassCracker”
“DeCafePro" from DeCafe, France at
“SourceAgain" from Ahpahcorp at
Example--Java Decompiler
•ClassCracker 2 Interface
•Components of ClassCracker 2
•
Java decompiler
•retrieves Java source code from Java class files
Java disassembler
•produces Java Assembly Code
AJava class file viewer
•displays Java class file structures.
•
•Features of ClassCracker 2
User visual interface.
Can decompile class files within zip or jar files.
Conversion mode (JAVA, JASM or JDUMP) is selectable
ABatc Mode allows multiple class files to be decompiled simultaneously