Decompiler‎ > ‎

Design

The following are my thoughts on how a SCI decompiler might be built, without having yet read any of the articles regarding decompilation. These thoughts may change after reviewing those articles.

1. Decompile all scripts in one go.

It doesn't make a lot of sense to decompile a single script in isolation since there are interdependencies between scripts that would make this almost impossible. The decompiler should therefore expect to have access to all of the scripts for a game, in addition to VOCAB.000, VOCAB.999, VOCAB.997 and VOCAB.996.

2. Start by decompiling those scripts that contain classes.

The VOCAB.996 file contains the details of all SCRIPT resources that define classes. Since most scripts use an instance of a class (e.g. every room object is an instance of Room), it makes sense that the scripts that define classes are decompiled first.

3. Decompile class defining scripts in the order they were compiled.

Most scripts that define classes will usually refer to classes that are defined in other scripts. For example, Object is defined in one script while Feature (which extends Object) is defined in another script. It is likely that VOCAB.996 will give us an indication of the order in which these scripts were compiled, so the decompiler should start by decompiling scripts in the order they are encountered in this file.

4. Decompile remaining scripts in the order they were compiled.

It is possible that VOCAB.997 will provide an indication of the order that the remaining scripts were compiled (since presumably selectors are added to VOCAB.997 as new selector names are encountered during compilation). If we can scan the scripts to determine what scripts use what selectors, then we can get a rough idea of the order they were compiled in. Also, by scanning scripts to identify those that refer to external procedures defined in other scripts, we can determine dependencies and therefore help to further establish the order.

5. Build lookup maps for selectors, kernel functions, classes and procedures.

Anything that is stored externally to the script will need to be available in fast lookup maps. This implies that there will be a first pass through all the scripts in order to build up these maps. This is in addition to loading VOCAB.999, VOCAB.997 and VOCAB.996 into memory. In the case of VOCAB.996, it will need to find the names of the classes in the scripts that the class table points to. Lucky for us, class names, methods names and property names are stored within the game files. Unfortunately this is not the case for procedures, so these names will need to be auto-generated.

6. Identify common byte code patterns that represent particular keywords.

Everything else discussed up to this point is trivial by comparison to the actual process of decompiling the byte code into keywords and operators. What I envisage is that certain patterns of byte code sequence can be identified that are common to particular keywords or operators. For example, adding 5 to any type of variable should in theory look similar apart from how the value of the variable is obtained. It would seem to make sense that since all SCI0 games used the same SCI compiler (or similar enough versions that differences can hopefully be ignored) then the compiler would always have produced the same byte code for a given keyword or operator. A set of distinct byte codes patterns needs to be identified and then those patterns need to be recognised as particular SCI keywords or operators. 


Comments