SemRep
=========
SemRep is Semantic Differential Repair tool for input validation and sanitization code. The tool analyzes and repairs validation and sanitization functions against each other. The tool does not need any manual specification or intervention. It takes two functions as Dependency Graphs then it looks for differences in validation and sanitization operations for string variables. If a difference is found, the tool suggests a set of three patch functions that can be used to fix the difference.
One application for the tool is to fix differences between a sanitizer function on the client-side and the correponding one on the server. The tool is language agnostic and can be used with Java, PHP or ASP.NET web applications. To achieve this agnosticism, the tool takes sanitizer functions in an intermediate representation that we call Dependency Graph which will be described in details below.
![img alt](Docs/arch.png)
Installing & Running Binary Files
======================================
The fastest way to try SemRep is to download the self-contained Ubuntu 12.04 64-bit binary files in the zip file [SemRepBinaries.tar.gz](SemRepBinaries.tar.gz). This compressed file contains:
1. **SemRep** which is a C++ Linux executable file generated with g++-4.8.1 on 64-bit ubuntu 12.04.
2. **mincut_codegen.jar** which is a java jar file.
3. [MONA](http://www.brics.dk/mona/) and [LibStranger](http://github.com/vlab-cs-ucsb/LibStranger) C shared libraries (**libmonadfa**, **libmonabdd**, **libmonamem**, **libstranger**) that were generated with gcc-4.8.1 on 64-bit ubuntu 12.04.
4. Three C++ shared libraries (**libboost\_regex-1.48**, **libboost\_system-1.48**, **libboost\_filesystem-1.48**, **libboost\_program\_options-1.48**) which are the same that are packaged with ubuntu 12.04.
5. Two python script: **run\_semrep.py** which is used to run the tool and an auxillary one called **patch\_result\_checker.py** to parse the tool output.
We will show how to run the tool to analyze and repair the [sample JS target function here](SemRep/test/js_version/target.js) against [the sample JS reference function here](SemRep/test/js_version/reference.js). Both Javascript functions' files come with the binary bundle. We will run the tool with a language-agnostic intermediate representation of the two functions that the tool takes as input which is called dependency graph. It is written in [dot](http://www.graphviz.org) format and will be explained later.
To run the tool we need to: (1) clone the repository, (2) uncompress the binary files and (3) run the python script. Here is how to do this:
```bash
$> git clone https://github.com/vlab-cs-ucsb/SemRep.git
$> mv ./SemRep/SemRepBinaries .
$> tar -xzvf SemRepBinaries.tar.gz
$> cd SemRepBinaries
# if you DO NOT have Java installed on your machine
$> sudo apt-get install default-jre
# the following will analyze and generate patches for the two example
# functions that come with the tool under directory name test
$> python run_semrep.py -r ./test/reference_depgraph.dot -t ./test/target_depgraph.dot -l JS -f x
$> ls output
# Here is the output tree with explanation
outputs/
├── raw_result_row.csv
├ : overall performance results
├── mincut_result_row.csv
├ : mincut performance results
├── generated_patch_codes
│ ├── final_patch.html
│ ├ : composition of patches generated (js)
│ ├── length_patch.html
│ ├ : simulation of length patch (js)
│ ├── sanitization_patch.html
│ ├ : generated code from mincut results. It can contain trim operation and/or escape operation and/or delete operation. (js)
│ ├── validation_patch.html
│ └ : simulation of length patch (js)
├── generated_patch_automata
│ ├── length_patch_BDD.dot
│ ├ : BDD that represents actual internal symbolic representation of length patch dfa
│ ├── length_patch_dfa_with_ASCII_transitions.dot
│ ├ : length patch dfa suitable for view by the user
│ ├── length_patch_dfa_with_MONA_transitions.dot
│ ├ : length patch dfa that is used to generate length patch simulation code
│ ├── reference_dfa_with_MONA_transitions.dot
│ ├ : auxilary dfa used by mincut algorithm escape and trim heuristics
│ ├── sanitization_patch_BDD.dot
│ ├ : BDD that represents actual internal symbolic representation of sanitization patch dfa
│ ├── sanitization_patch_dfa_with_ASCII_transitions.dot
│ ├ : sanitization patch dfa suitable for view by the user
│ ├── sanitization_patch_dfa_with_MONA_transitions.dot
│ ├ : sanitization patch dfa used by mincut algorithm to generate mincut alphabet
│ ├── validation_patch_BDD.dot
│ ├ : BDD that represents actual internal symbolic representation of validation patch dfa
│ ├── validation_patch_dfa_with_ASCII_transitions.dot
│ ├ : validation patch dfa suitable for view by the user
│ ├── validation_patch_dfa_with_MONA_transitions.dot
│ └ : validation patch dfa used to generate validation patch simulation code
```
Hopefully, by now you had a successful run (report a bug please if you did not) and your SemRep output directory
has the same tree shown above. The most important output is the final\_patch.html file. This file represents the output patch that SemRep generated to remove the difference between the target function and the reference one by strengthening the target against the reference. This patch is supposed to be "composed" with the target function i.e., it is supposed to be run along with (before) the target function as shown in the following picture:
![img alt](Docs/repair-figure.png)
The new patched target function is stronger than the original one and the reference i.e., it does not return a string that is not returned by any of the two. If you chose javascript as patch language (-l JS), you can immediately see the behavior of the final patch (and intermediate ones) by opening (double click) the generated .html files in your browser and trying some input values.
So, what did SemRep do? SemRep is an automatic repair tool. It takes two functions, a reference and a target, as its input and **generates a patch code** to strengthen the target function against the reference.
1. SemRep took the two input-validation-and-sanitization functions (represented as [dependency graphs]()) that are in the [test](SemRep/test) directory.
2. SemRep ran the differential repair algorithm described [here](Docs/issta14_paper.pdf) and generated a set of patch-automata under directory outputs/generated\_patch\_automata (see [here](http://en.wikipedia.org/wiki/Deterministic_finite_automaton) for more info on what an automata is). You can use xdot (sudo apt-get install xdot) to double click on any of these automata files and see it.
3. Java Code in MinCut package **generated the patch in Javascript language** (following input flag -l JS) in directory outputs/generated\_patch\_codes. The patch is a composition of a number of auto-generated Javascript patch-functions that either simulate the automata generated by SemRep or uses the MinCut algorithm.
Input format: what is a dependency graph?
-----------------------------------------
SemRep is language agnostic i.e., it can fix different input validation
and sanitization functions from different programming languages against
each other. To achieve this, SemRep takes an intermediate representation
of a function as its input. We call this representation a Dependency Graph.
![alt text](Docs/depgraphs/depgraphs.png)
The tool takes as input the dependency graphs of two sanitizer functions as
shown in the picture above.
A dependency graph specifies the data and control flow in the program.
It is a directed graph that has a finite number of nodes and directed edges.
If there is an edge from node N1 to N