The New Basic Decompiler

Click here to go to Basic Decompiler Home Page

The New Basic Decompiler *** THE NEW IMPROVED BASIC DECOMPILER ***

This is a Windows program that requires Windows 95 or later. It combats the worst of the problems of selecting the parameters and overcoming the rough edges of the previous version but it is still a bit technical to use, in so much that the person using it needs to be used to manipulating hexadecimals. Off course, to use the finished product (Basic Source) you only need a knowledge of Basic.

GENERALLY HOW IT OPERATES.

GETTING STARTED

USING THE MAIN WINDOW.

HEXADECIMAL INVESTIGATOR

EXAMPLE

VIEW THE DECOMPILATION

HEX VIEW OF DECOMPILATION

INTERPRETING THE DECOMPILED TEXT

LIST OF ERROR MESSAGES

GENERALLY HOW IT OPERATES.

The user selects the EXE file to decompile and then is presented with the main window which helps the user to select the parameters. When they are selected the user is able to select the Decompile-Go option which does the decompilation. This causes a specially modified version of the old decompiler to run in a DOS box. You may then select the decompile-view option to see the results. You may then alter the parameters and Go again. There is a Decompile-Compare option to look at the previous runs. This displays the number of errors etc. To help you to select the best parameters. There are also facilities to help knock the decompilation into shape.

GETTING STARTED

It might be a good idea to run through the worked example first. Otherwise:

1 - SELECT INPUT

When you run the program you are presented with a blank screen containing merely a menu bar and a start button.

You click on the start button and are presented with a standard File Selection box.

Here you select the input file. The first time you work on a particular program this will be an EXE file but if you look at the "Select Files Of Type" box at the foot of this window you will see that there are two other file types that can be opened, namely Decompressed EXE files (.DCE) and Parameter files (.PAR). These are created by the decompiler and on subsequent runs you will be able to save time by selecting these.

2 - DECOMPRESS INPUT.

At this point you still have a fairly blank screen. Just 2 buttons and a menu bar. You should click on the Decompress EXE File button. Two progress bars appear while this is taking place.

3 - SCAN FOR DATA FIELDS.

The main window is now displayed. This has a menu bar and all sorts of things described later. At this point it is a good idea (but not essential) to take the DataFields-Scan option in the menu bar.

Another progress bar is displayed while the program examines potential data fields and selects the best one. When complete the details of the best data field are displayed.

4 - DO THE DECOMPILATION.

Take the Decompile-Go option in the menu bar.

The decompilaion takes place in a DOS Box. A message usually (not always) appears in the DOS window when this is complete. When this is complete the DOS window must be closed by clicking on the X in its top right hand corner. Then there is another progress bar to wait for.

5 - VIEW THE RESULTS.

Take the decompile-view option.

Otherwise you could use a word processor to look at the .DEB file which will have been created in the same directory as the input file.

Either way you now have your first decompiled source text. The chances are that the first attempt will not be very good. It may either be very short or full of messages such as: Error or Unrecognised Function or perhaps lots of CALLs to subroutines that don't exist. Most of the facilities in the main window are to help you to overcome these problems. See:

Interpreting The Text Output

Diagnostics

Using the main window

USING THE MAIN WINDOW.

The way you use this window depends on what problems you are having with your decompilation. The best third step (After taking the DataFields-Scan option then the Decompile-Go option. See Getting Started ) is to view the decompiled output by taking the decompile-view option an then use the Diagnostics to determine the best way to use this window to produce an acceptable decompilation. However in this section I'll describe each facet of this window starting with the menu bar:

MENU BAR OPTIONS

Files-Save-Decompilation Please note. This menu bar option if for a future version.

Whenever you get a decompilation that you like it is a good idea to save it by taking this option.

Decompilations are stored in .DEB files but this operation saves the current .DEB file in a Dxx file where xx is a number 01-99. Thus you can store up to 99 different decompilations of a program. It is advisable to do this if you are taking actions which edit the decompiled output.

Files-Save-Parameters.

The parameters are saved automatically each time you take the Decompile-Go option.

You use this option to save them without doing an actual decompilation.

This saves the parameters in a Parameter file (.PAR) which may be selected when you subsequently start up the decompiler. It saves having to redefine the parameters each time. You may also return to a previous set of parameters using Decompile-Compare.

Files-Exit Exits from the program.

DataFields-Scan

This scans the input file in an attempt to locate the data field . It looks for blocks of String Descriptors and the block with the most entries will be adopted. This will be used to set the values in the fields for the Physical and Logical Start Of Data.

If this is not correctly located the constant data and data strings in the source code will be incorrect and most probably look most peculiar!

The following are used to help get the correct values for this field:

DataFields-Investigate.

Physical start of data.

Logical start of data.

Adjust Values.

DataFields-Investigate

Note: You should take the DataFields-Scan option before using this option.

This displays the Hexadecimal Investigator who's primary purpose is to help locate the data field correctly but it has many other uses. E.G. Scanning the .EXE file for a particular hexadecimal string.

Decompile-Go

This executes the actual decompilation by shelling to DOS to run a slightly updated version of the old decompiler which will be referred to as the "DOS Program". The DOS window will appear and you may see the cursor darting about it. Unfortunately under Windows 98 the display of programmes compiled using the Microsoft C Compiler (Version 6 was used to compile this) does not appear until the program finishes. If it finishes, a display appears asking you to close the window which you do by clicking in the X icon in the top right hand corner. Sometimes this message doesn't appear. On such occasions you should close the window when you get fed up with waiting!

After closing the DOS window a progress bar appears. You can speed this up - See Don't remove EOF characters. Then you are returned to the main window. You may now view the results by taking the decompile-view option.

Decompile-Compare

After you have decompiled (Using Decompile-Go ) the same program several times using different options you may wish to compare the results. This option does this by displaying the Comparison Window. This also enables you to retrieve the parameters of a previous attempt.

decompile-view

This enables you to view the decompilation using the View window.

This also enables you to scan the decompilation for a particular string.

Decompile-HexView

This displays the Hex version of a decompilation in the Hexadecimal Display window.

It also has the facility for searching for strings and resolving doubtful addresses (E.G. In GOTO statements) problem FOR/NEXT loops etc...

Decompile-Subs

This displays a summary of all the Subroutines in the decompilation. It may be useful to print this out using the Print option that is in the menu bar of this display.

Decompile-Calls

This displays a summary of all the Calls in the decompilation. It may be useful to print this out using the Print option that is in the menu bar of this display.

That's the menu bar, now for the other features on the screen.

Progress bar - This is usually used when you have to wait.

Physical Address Of Start Of Data Field.

see Data Field

This is the number of bytes into the actual EXE file at which this field starts.

On of the big problems of Basic decompiling is locating this. Without it all the constant data and strings in the decompiled text will be wrong and will probably look very strange!

The value in this field is always passed as a parameter to the DOS Program.

A value may be typed directly into it but is usually better to determine its value using the following:

DataFields-Investigate.

Physical start of data.

Logical start of data.

Adjust Values.

Logical Address Of Start Of Data Field.

This is the address of the start of the data field using the addressing system used in the EXE file.

E.G. If this value was xxxx in the EXE file. An assembler instruction to load (to AX) an integer at the very start of the data field would be MOV AX,xxxx

The value in this field is always passed as a parameter to the DOS Program.

A value may be typed directly into it but it is usually better to determine its value using the following:

DataFields-Investigate.

Physical start of data.

Logical start of data.

Adjust Values.

Adjust Values scrollbar.

This is used to synchronously change the Logical and Physical address of the start of the data field . When these addresses are determined automatically they are often relatively correct but they only point to the first string in the data field. There is often non string data preceding this and so the start of the field has to be moved back a little to encompass this. Otherwise the decompiler will regards these fields as variables.

It is probably better to achieve this by taking the DataFields-Investigate. option.

EXE Type options

If you get this wrong a great deal of rubbish will be produced -- See Diagnostics.

These indicate to the decompiler whether the EXE file was originally created to run with a runtime module (E.G. BASRUN.EXE) or as a stand alone program. If the Automatic option is selected the DOS Program attempts to determine this automatically but often gets it wrong.

The best way to determine the EXE type is to do a Decompile-Go followed by a Decompile-View and look for the table described in section 2.3 of Interpreting the decompiled text. If the field 2.3.3 (See Interpreting The Decompiled Text) is ever non zero then it is almost certainly a "Run Time Module" type. Otherwise unless a "Subroutine Address Correction" is required it is a "Stand Alone" EXE.

Amount to decompile

If this is left blank or set to Automatic then the program decompiles as far as the first END statement or null byte. Unfortunately this often fails because of spurious null bytes right at the start of the program or in between subroutines.

If necessary it is best to start with a low value here. High values can often upset the DOS Program! A clue here can be obtained by looking at the Sector Table (See section 2.3 of Interpreting The Results). in the decompiled source text. Generally if this number is too low there are unresolved subroutine addresses or branches. If it too high there is a lot of rubbish in the decompiled source, DOS functions are treated as subroutines or the DOS Program fails.

See Diagnostics.

Compiler Version.

You might need to experiment with these.

Often a prior knowledge of the origins of the program helps here.

Another possibility is to use the Hexadecimal Investigator to look for strings such as "Microsoft Basic Compiler Version 4.5" this is often just before the data field or sometimes it is near the very end of the EXE file.

See Diagnostics.

Subroutine Address correction.

Often the addresses of both Subroutines and DOS functions are out.. This results in many error messages or unrecognised functions - See Diagnostics. The correction factor (Often negative) must be added here to rectify this. For determining the correction factor use Debug - see Debug.

Instruction Start override.

The instructions usually start at Hex 230 but if they don't you need to enter a value here. The best way to determine this is using the Hexadecimal Investigator. The start of executable instructions is frequently preceded by Hex 30 bytes of data preceded by lots of null bytes (Hex 00). If the instructions do not start at 230 it is usually because of a large table of numbers or addresses (Relocation table) at the start of the EXE file. Look for the end of these and you could well find the start of the instructions.

Long Subroutine Addresses checkbox.

You will very rarely need to clear this checkbox because long addressing is almost always used for subroutines.

Long String Descriptors checkbox

There are two types of String Descriptors :

1 - Short. This is commonest and consists of a word (2 bytes) containing the length of the string followed by a word containing the logical address of the string - 4 bytes in all..

2 - Long. This is 6 bytes (as opposed to 4) and consists of the Sector address followed by the logical offset of the string followed by a word containing the length of the string.

If you examine the data field using the Hexadecimal Investigator. it is easy to determine the type of string descriptor.

Warning... It is often necessary to subtract 4 from the Logical Address Of Start Of Data Field when you have Long String Descriptors.

Line number increment.

It is a good idea to use this to keep the highest line number under 30,000.

More rarely used parameters button.

You use this to get at the following 10 options:

Suppress Data Statements checkbox.

Often the decompile places spurious DATA statements in the decompiled text. Obviously these are not used if there are no READ statements. Even if there are READ statements it can be useful to suppress the DATA statements to speed up the decompilation when you do your initial decompilations to tune the parameters. Merely checking this option can speed up the DOS program. It can speed things up further if used together with the Don't Remove EOF Characters checkbox.

Use Old Format Float Decimals checkbox.

Early versions (1.00 1.02 I think) used a different format for Real Numbers ( I call them Float Decimals as a hangover from PL/I). If you select these versions the old format is used automatically but this option might be necessary if you have a compiler version that isn't specifically catered for.

Assume Float load in version 7.2.

In version 7.2 there is an ambiguity in that if the assembler code in the EXE file moves two words from successive locations to successive locations then this could be either, moving one real value (this takes up 2 words of data) from one location to another, or moving two integers from and to successive locations. Unless this box is checked the former is assumed.

Don't prefer arrays checkbox.

Another frequent anomaly is that compiled basic programs often refer to array elements by addressing them directly. Thus if an integer is referred to which happens to be 4 words after the start of an integer array could it be referring to element No. 3 of that array or could it be referring to a different integer field after the array because the array only contains 3 elements.

Normally the former is assumed. You tick in this box to assume the latter.

Don't Display Pass 1 checkbox.

As with compilers the decompiler does 2 passes to resolve addresses. The output of pass 1 is occasionally useful for diagnostic purposes. Check this box and it precedes the usual output.

Don't Display Dims For Non Dynamic Arrays.

All non dynamic fields are normally included in COMMON statements at the start of the decompiled text.

These aren't always necessary and can be regarded as a lot of unnecessary clutter.

Check this box to remove them.

Don't Display Symbol Table.

A symbol is displayed at the end of the output. See Using The Symbol Table.

This can be regarded as a load of unnecessary clutter. Use this checkbox if you wish to remove it.

Search DOS version on /2

Don't Remove EOF Characters.

After the DOS Program has run all EOF characters in the decompiled source are replaced by blanks.

This can cause an annoying wait after a Decompile-Go has been executed. Checking this box speeds this up but if there are any EOFs in the output then the output to both the View window (See Decompile View) and to some word processors, that you may wish to use to operate on the output, will cease at the first EOF character.

The is not necessarily a big problem in the early stages of selecting the parameters, also as EOF characters usually only appear in DATA statements it might be no problem if you check the suppress Data Statements checkbox as well. Therefore in these circumstances it might be a good idea to check this box.

Note: Another possible source of EOF characters is in data strings particularly if the Start Of Data Field addresses are incorrect.

Library Function address correction.

If library functions are not being addressed properly it will cause many "Unrecognised Function" messages. To investigate these see 6 - If it contains a lot of "Unrecognised functions" . To correct the addressing of these functions you must check this box and the "Long Subroutine Addresses" box on the main window must also be checked - it usually is. When you check this box anything entered in the "Subroutine Address Correction" box will cease to apply to Library function addresses and a text window will appear below this check box. If a correction is required to the address of the library functions then it must be entered in this box.

Then you get the following 4 variables:

Variable start override.

This is the logical address at which the variables start. It is usually determined O.K. automatically. If it is too low then constants may be treated as the addresses of variables - vice versa if too high. There is an ambiguity in that any variable address might be a constant. The decompiler usually distinguishes these correctly but not always.

Limit working memory.

If the DOS program fails because of lack of memory it is a good idea to try Hex 8000 here and then keep on halving it until it runs successfully. Unfortunately there is a bug in Windows 98 that causes the "Out Of Memory" message to not be displayed. I intend to find a way of programming round this at some point in the future. Meanwhile, if the DOS program fails it is a good idea to assume this condition.

Special String Moves and Special String Concatenate.

Most Basic programmes use the usual Basic Library functions for moving or concatenating strings. There are a few that don't! These use special subroutines instead. In these (rare) cases the addresses of these subroutines should be entered here. If you have a programme compiled in this way it soon becomes obvious from the vast number of calls to these routines. The string addresses are passed to them in SI and DI.

Then there are up to 4 pairs of AddOn factors. These are called Logical and Physical. Programmes compiled to run with a Run Time Module and subroutines beyond the end of the program which do not consist of an Interrupt 3F are regarded as an AddOns. For other programmes any subroutine address that is higher than the value in the very first of these fields is regarded as an AddOn. Sometimes the address of the subroutine requires a correction factor. This (The amount to be added to calculate the address of the start of the AddOn in the Decompressed EXE file) is entered in the right hand field (Marked Physical). Sometimes that are different AddOns with different correction factors. Hence the 4 pairs of values. The Logical column containing the starting address used in the program for an add on and the Physical column containing the correction factor for that AddOn. Sometimes AddOns appear before the Library Subroutines. If this is the case then the checkbox to the right is checked.

THE HEXADECIMAL INVESTIGATOR

This is primarily intended to locate the constant data field but it has many other uses. Basically it gives a hexadecimal display of the EXE file alongside an ASCII display. In the bottom left of the display there is a list of likely locations of the constant data field. This has 3 columns as follows:

1 - Number Of Character Strings Detected.

2 -.Physical address

3 - .Logical address

The more character strings that have been detected in a field then the more likely this is to be the actual field that we are looking for and the higher in the list the particular entry is. When the window is first viewed the start of the field on top of this list is in the middle of the display. This constant data field is coloured red in the hexadecimal display. If you click on one of the other entries in this list then that entry is said to be "Adopted" and the hexadecimal display moves to the start of that field - The Adopted field is coloured red. The location of the adopted field may be moved a byte at a time, either way, using the left hand scroll bar. The right hand scroll bar may be used to mode the hexadecimal display to anywhere within the EXE file. There is a button marked APPLY. When you click on this the logical and physical addresses of the currently Adopted constant data field are copied into the appropriate fields in the main window. The use of the EXIT button is self explanatory. There is a menu item - Scan . This enables the user to scan for a hexadecimal string in the EXE file. Once it is found it appears in the centre of the hexadecimal display - coloured green.

The adopted field (or any other field) should start with the first String Descriptor (4 bytes) followed by the actual string.

DIAGNOSTICS

If the "Number Of Character Strings Detected" never exceeds 2 or 3 then suspect that this may not be a Basic program. See Determining whether a program is a Basic program.

If there are more than 1 field in which the "Number Of Character Strings Detected" is large (say exceeds 4) Then it could be that the decompiler has made a fault in the Decompression (or you bypassed the Decompression phase). The best way to deal with this is to click on the field that has the highest .Physical address. The 3rd and 4th byte of this field should comprise an address - part of the String Descriptor. Subtract this address from actual address of the string to calculate what I'll call the "X Factor". If you the examine the hexadecimal display backwards from this point you should soon come to a string descriptor belonging to the other field. The differences in the "X Factors" for these should equal the number of bytes missing (or added). The only way to deal with these is to correct the .DCE file using a hexadecimal editor.

EXAMPLE

Example not yet created.

VIEW THE DECOMPILATION

This window views the actual decompiled text (See also Interpreting the Decompiled Text) . It has a scroll bar on the left for scrolling the text and a menu bar containing the following options:

Scan-ScanString This displays the Scan Box which enables you to scan the decompiled text for a specific character string. Once located the string is about a quarter of the way down the window and the parts following it are coloured blue.

Scan-Next After a string has been detected using the previous option, this option finds the next occurence of the string.

Locate-SearchLocation If you locate a string using either of the above 2 options and then lose it by scrolling to a different part of the text then this option retrieves it by scrolling the view so that the string is located about a quarter of the way down the window.

Other facilities of this screen. Double click on a line. Then this line is treated as the last Scan destination - see menu bar options above.

HEX VIEW THE DECOMPILATION

This is the same as the ordinary view except that the hexadecimal code for each instruction appears on a line after the instruction. There are also what are called "Dump Lines" (See below) and there are additional menu options (Also described below).

Dump Lines contain the word Dump followed by a string of hexadecimals. They come in pairs and are often absent for EXE files produced by version 1 of the Basic compiler. They occur whenever an Assembler long CALL statement is in the code. This can either be a call to a subroutine or a Basic function. The first line contains the actual hexadecimal code of the CALL statement. E.G.

DUMP 9A CA 2B 36 04

The second line contains the hexadecimal code of the start (1st 19 bytes) of the called routine.

In addition to the menu options in the ordinary View screen there are the following menu options:

GetBranch This looks at the Assembler branch instruction after the last Scan destination and colours the hexadecimal code for it red. It then colours green the hexadecimal code for the destination and beyond and scrolls the view so that this is it near the bottom of the window. NOTE: You may or may not have to scroll manually to see the part which is coloured red.

Locate-BranchSource This scrolls so that the text that was coloured red in the previous option is about a quarter of the way down the window.

Locate-BranchDestination This scrolls so that the text that was coloured green in a GetBranch operation is about a quarter of the way down the window.

These additional options are particularly useful for resolving bad branch location which have '???? after them.

The Scan Box

This is used to search for:

A character string if you are in a View window or Hexadecimal View window.

A Hexadecimal string if you are in the Investigate window.

The search starts from the previous hit or the start of the text if there has been no previous search. This starting location can be reset to the start of the text by clicking the Reset Button. You start the search by clicking OK or get back to the text window without a search by clicking the Cancel button.

COMPARISON WINDOW

After you have made several attempts at decompiling, this window helps you to decide which set of parameters are the best or which ones to adopt.

It displays a list of parameters and results for each time you have performed a Decompile-Go or a Files-Save-Parameters. Each run is given an ID number. The most recent has an ID of 0 then the remainder are given numbers in ascending order. The parameters are stored in Parameter files. The file for ID=0 has the extension .PAR and the remainder have extensions of the form .Pxx where xx is the ID number. If you click on an entry in this list then its parameters moved into the main window.

Here is a list of columns in the Comparison window:

1 - ID number - described above.

2 - Physical start of constant data field

3 - Logical start of constant data field

4 - Whether this program is assumed to have long String Descriptors .

5 - EXE Type (Stand alone, run time module or automatic)

6 - Number of bytes to decompile.

7 - Address correction for subroutines.

8 -.Physical address of start of executable code.

9 - Result string from DOS program. If the DOS program fails this might contain the reason. Otherwise it contains "Successful".

10 - Number of unrecognised functions.

11 - Number of errors

12 - Last line number of decompiled source.

DIAGNOSTICS

After doing a Decompile-Go take the decompile-view option to look at the normal View window :

1 - If this contains a lot of "error 1506" messages.

2 - If it contains a lot of "Unrecognised Ver 1 functions"

3 - If it contains lots of GOTO ????'???? or GOSUB ????'????

4 - If all the character strings are rubbish, contain strange characters or are truncated.

5 - If there is hardly anything (except perhaps END XX) in the normal View Window.

6 - If it contains a lot of "Unrecognised functions"

Currently this part of the manual is not complete. Eventually there will probably be about 20 diagnostics

1 - If this contains lots of "error 1506" messages then:

If the EXE Type was not Stand Alone then change it to this and Decompile-Go again. If this does not remove most of the "error 1506" messages then:

Try increasing the Amount To Decompile and Decompile-Go again. If there are still "error 1506" messages left then: The cause of this error message is an unidentified Add On routine. If the EXE type has been selected as Run Time Module then any Function Subroutine that does not start with Interrupt 3F is regarded as an Add On. NOTE Any Call destination beyond the Amount To Decompile is regarded as a Function Subroutine. For Stand Alone EXE types you must have specified Add Ons in the More Rarely Used Parameters window. Either way this might be caused by an actual Add On routine which isn't in the decompilers list of Add On profiles. Admittedly this is not a very comprehensive list. However it might be a good idea to investigate this by going to the HEXADECIMAL VIEW and looking at the first DUMP line for the Call concerned. Deduce the destination for the Call and compare it with the Segment Address List asking the question Does It Look Right?

2 - If it contains "Unrecognised Ver 1 function" messages

then the EXE file is probably produced using Versions 2 or 3 of the Microsoft Basic compiler. Currently (There are plans to rectify this) the decompiler does not cater for this type of file. However they are both so similar to Version 1.02, which the decompiler does cater for, that if you select this Compiler Version you will probably get a fairly good result. Except that there will be rather more rough edges than with other versions.

3 - If it contains lots of GOTO ????'???? or GOSUB ????'????

Then try increasing the Amount To Decompile.

4 - If all the character strings are rubbish, contain strange characters or are truncated.

Then it is probable that the locations (Physical and Logical) are incorrect. This is best tackled using the DataFields-Investigate option. Sometimes it is a matter of just increasing or decreasing one of these factors by 1, 2 or 4 bytes. If all else fails it is a good idea to investigate the first reference to a character string using Debug.

5 - If there is hardly anything (except perhaps END XX) in the normal View Window then:

The most likely reasons are:

5.1 - The relocation table is so large that the entry point is beyond the usual location (Hex 230).

5.2 - There are some spurious non executable bytes near the start of the program.

5.3 - It is not a DOS Basic program.

To investigate this take the DataFields-Investigate. option and start by pushing the right hand scroll bar up until the hexcadecimal code of the start of the EXE file is displayed. After about byte 20 this is probably lots of binary zeros unless 5.1 is the case. It is best to confirm this by clicking the bottom of the right hand scroll bar until the locations 200-230 are displayed. If the following are true:

Location 202 contains the name of the program in ASCII.

Location 230 appears to contain the start of the executable code.

Prior to location 200 are lots of binary zeros.

Then it is best to assume that 5.2 is the case and start by setting the "Amount to decompile" to 1000 to start, perhaps increasing it later. Otherwise:

If these locations contain what looks like colums of figures then it is best to assume that 5.1 is the case and you should click the bottom of the right hand scroll bar to display further into the EXE file. Eventually you will get to the end of the relocation table and should see the program name at location hex xxx2 and the start of the executable code hex 2E bytes further on. If this is the case then you should enter the hex location of the start of the executable code in the "Instructions Start Override" box in the Main Window.

Otherwise it might be a good idea to investegate (5.3) whether this is really a DOS Basic program.

6 - If it contains a lot of "Unrecognised functions"

Then either:

6.1 It is trying to regard Subroutines as functions - Increase the "Amount To Decompile" .

6.2 A Subroutine Address Correction is required.

6.3 If it is a "Runtime Module" EXE type then you might need to experiment by trying the different versions 4.0, 4.5 or 7.2.

Is it actually a Basic program?! The best indicator here is whether it managed to detect String Descriptors - see DataFields-Scan and DataFields-Investigate options. There is always a chance that it is a Basic program without any strings! Other ways are to do an ASCII scan to look for strings like a Microsoft copyright notice. Not all compiled Basic programmes contain these. Other things to look for are BRUN or BASRUN. Look for the program name either at location Hex 200 or immediately after the relocation table. The program entry point should be about hex 30 bytes past this. If all this fails then it probably isn't a Basic program. NOTE this decompiler doesn't work on Basic programmes compiled by non Microsoft compilers E.G. Power Basic.

USING DEBUG

This section is not complete. In fact it's hardly started. It does not contain general details about using Debug - you must look up your DOS manual for these. It will contain specific techniques useful for decompiling DOS Basic programmes.

Having just reread this section I appreciate that it is a bit cryptic! I apologise for this and promise to rewrite it one day when I have the opportunity. Meanwhile you'll have to struggle, particularly if you are not familiar with using Debug or working with Hexadecimals. If you are really desperate you could always send me an E-mail for help.

To use this on the program xxxxx.EXE then enter the DOS command:

debug xxxxx.EXE

Then the Debug command:

With some of the older (Some pre version 4.0) versions of Basic the CS and the DS will contain the same values and CS:IP will point to the start of the Basic program proper. (I.E. That which starts at HEx 230 in the EXE file).

With most programmes you will need to run them up to this point before you do anything else. Usually you do this with the command:

g DS:130

Strictly speaking you should use:

g xxx:30

where xxx is the contents of DS plus 10. However as this part is such a disaster prone operation it might be best to take some precautions first:

Location DS:0 should contain Hex CD 20

Location DS:80 and DS:102 should contain the program name.

Most important, unassemble location DS:130 (Better still xxx:30) and it's vicinity and satisfy yourself that this location appears to be reasonably likely to be the start of the executable part of the program proper.

If all has gone correctly you should end up with debug stopped at location xxx:30 (I.E. The start of the program proper) having executed the Basic initialisation routines which will have entered the correct addresses in the CALL (To subroutines or Library functions) statements.

You may now do the following:

To determine an address correction It is best to use the decompiler and Debug sessions in parallel and proceed as follows:

1 - In the Decompiler session go to the Hex output screen (after executing Decompile-Go ...etc...) and find the a function call for which there is a couple of DUMP lines.

2 - In the Debug session Unassemble the offending assembler CALL (Code 9A). It will unassemble as something like CALL xxxx:yyyy say.

3 - Then enter U xxxx:yyyy to unassemble the start of the called subroutine or library function.

4 - The hexadecimal of this should be identical to the second DUMP line in the Hex display. If it is then no correction is required. Otherwise it is best to either use the Search facility in Debug or a Hexadecimal editor to determine the difference in location between the hexadecimal string displayed in the second DUMP line and the correct start of the called subroutine or library function, as displayed in Debug. This value should then be entered (You might need to multiply it by -1) in the Subroutine Address Correction box.

GLOSSARY OF TERMS

This glossary is by no means complete yet.

Bits

Like Bytes but even more fundamental. Each Bit may have one of 2 values. Often called "On" and "Off" or 0 and 1..

Bytes

This is a fundamental unit that makes up a file. An EXE file (or any other file) comprises many of these. Each one may represent a character. They may represent Data or executable instructions. Each one comprises 8 Bits which are often regarded as 2 units made up of 4 bits which may each have 16 different values, often represented by hexadecimals. File lengths are usually measured in Bytes.

Character Strings

Text, usually in the data field in the program to be decompiled. This comprises continuous bytes containing characters.

Data Field

This is the part of the program to be decompiled which contains the constant values and text.

Hexadecimals

16 characters used to represent the 16 values of half a Byte . These are 0-9 and A-F.

Logical Address

The program which is to be decompiled will contain its own address system to address parts of the program. This will probably be the Physical Address with an constant amount (Sometimes called the offset) subtracted from it.

Logical start of data..

This is the Logical Address of the start of the data.

Physical Address

The Physical Address of an item is the number of Bytes from the start of the EXE file at which that item occurs.

Physical start of data

This is the Physical Address of that start of the constant data in the program being decompiled.

String descriptors

This is 4 (or 6) bytes that describe a character string by giving its address and length. See Long String Descriptors Checkbox .

Strings

See Character Strings

INTERPRETING THE DECOMPILED TEXT

Generally the text output is in Basic code which is best interpreted using your Basic manual. There are however a few comments, error messages and other things inserted by the decompiler which will require a bit of interpretation.

1 - Error messages contain the word "error" followed by the error number. For a list of error numbers see

LIST OF ERROR MESSAGES.

2 - Here is a list of (odd!) things inserted by the decompiler:

2.1 - G Parameters (don't ask what G stands for!) These are listed on a comment line right at the start of the decompilation. There are 3 of them, respectively:

2.1.1 - The lowest logical address that a constant can have.

2.1.2 - The highest logical address that a constant can have.

2.1.3 - The lowest logical address that a variable can have.

2.2 - Another comment line (after 2.1) contains, respectively:

2.2.1 - The name of the programme being decompiled and the output file.

2.2.2 - If the program is being regarded as a Stand Alone program then there is a statement to that effect.

2.2.3 - Sometimes the is a spurious, self explanatory comment.

2.3 - A table will appear, usually near the start, containing a list of the Code Segment addresses encountered by the decompilation. The following values appear in columns for each segment:

2.3.1 - The hexadecimal address of the segment.

2.3.2 - The number of Calls made to the segment.

2.3.3 - The number of those calls in which the destination contained Interrupt 3F - This will indicate the presence of a Run Time module.

3 - Here is a list of comments inserted by the decompiler and their meanings. They are largely self explanatory:

3.1 - Length of GET/PUT destination = 30 Bytes

This appears in GET and PUT instructions. The handling of the variables in these instructions is not, as yet, very automatic. Either FIELD statements are used, which the decompiler handles OK, or the variable is a TypeDef Structure. In the latter case the user will have to deduce the format of the structure. This comment will give the length of the structure which is useful for this task.

3.2 - REM The subroutine just ended had 1045 bytes worth of parameters

The parameters in a subroutine are determined at its start. This figure is deduced from the number of bytes cleaned off the stack at its end. They can be a useful check particularly in programmes that have a complicated mess of subroutines.

3.3 - REM Block of null bytes length =

This occurs in blocks of non executable code. Usually between subroutines of after the end of the executable part of the programme.

3.4 - REM Previous line contained long integer

As yet, the decompiler doesn't deal with long integers. Where they occur it is left for the user to sort them out in the process of knocking off the rough edges. This remark provides help in this area.

3.5 - "REM Late Microsoft Basic with COM statements" or "REM COM() statements in this programme"

These programmes have a check for Comms interrupts at the beginning of each Basic statement.

ERROR MESSAGE NUMBERS

Error messages are placed in the decompiled text when the decompiler encounters something that isn't quite right. It might be an section of assembler code that cannot be made into a Basic instruction for some reason of just something suspicious like executing a GOTO while something is on the stack. Often errors occur when the decompiler attempts to decompile non executable code like the parts in between subroutines. When this occurs there are many errors together. Otherwise they could be caused by a flaw in the decompiler in that it does not cater for the particular sequence of assembler codes that are present. Let me know when you get this type and I'll modify the decompiler!

There may also be self explanatory error messages in the text (E.G. Stack error).

When errors occur it might be a good idea to investigate them or at least tread carefully.

The error message takes the form of:

REM error xxx [(yyy)]

where:

xxx is the error number.

yyy is a list of the contents of the stack at the time.

the part in square brackets is not present if there is nothing on the stack.

Here is a list of error numbers with their rough meaning:

1 Assembler instruction 3B (Compare) with subfunction that has no known context.

2 Assembler conditional branch (7x) for which the condition codes have not been set.

3 Same as 2.

4 Assembler instruction 89 with subfunction that has no known context.

5 Interrupts 36, 38 or 34 to perform arithmetic operations have been encountered without a valid subfunction.

6 Interrupts 35, 37, 39 or 3B with subfunction 1E to get a value off the stack is not followed by a wait (Interrupt 3D).

7 Interrupts 35, 37, 39 or 3B with subfunction 9C or 9D to get a value off the stack into an array element (SI or DI) is not followed by a wait (Interrupt 3D).

8 Interrupt 35, 37, 39 or 3b (arithmetic operation) without recognised subfunction.

10 An interrupt 3A (arithmetic operation) requiring 2 values on stack but there are not 2 values on the stack.

11 Unrecognised interrupt code.

12 GOTO or GOSUB executed when there is data left on the stack.

13 Assembler instruction FF with subfunction that has no known context.

14 Unrecognised Assembler code.

17 Either COLOR, LOCATE or SCREEN statement has too many items on the stack.

18 An attempt to remove an item from the stack but there are no items on the stack.

20 Interrupt 3F Function FF Subfunction7 27 & 28 (Exponent) with no items on stack.

22 Interrupt 3F Function 7D & 7B (Digest data for LINE statement) doesn't have 2 values on the stack.

23 Interrupt 3F Function 04, 0E & 0D (Digest data for CIRCLE statement) doesn't have 4 values on the stack.

24 Interrupt 3F Function 08 (CIRCLE) doesn't have enough values on the stack.

25 Interrupt 3F Function 6D (LINE) doesn't have enough values on the stack.

26 A series of Basic functions which should have one item on the stack but don't.

27 Interrupt 3F Function 39 (EOF) doesn't have any values on the stack.

28 Interrupt 3F Function 0F (DIM) doesn't have any values on the stack.

29 Interrupt FF Various Print functions - doesn't have any values on the stack.

32 Interrupt 3F Function 46 (MID$) doesn't have 3 values on the stack.

33 Interrupt 3F Function 65 (INSTR) doesn't have 3 values on the stack.

34 Interrupt 3F Function 9B (PMAP) doesn't have 3 values on the stack.

35 Interrupt 3F Function 86 (OPEN) doesn't have 3 values on the stack.

36 Interrupt 3F Function 87 (OPEN) doesn't have 3 values on the stack.

37 Interrupt F Various INPUT functions should have nothing on the stack..

38 Interrupt 3F Function C2 (Move string) doesn't have 2 values on the stack.

39 Interrupt 3F Function C3 (Concatenate string) doesn't have 2 values on the stack.

40 Interrupt 3F Function D3, D7 & D6 (STR$) doesn't have enough values on the stack.

41 Interrupt 3F Function 66 (KEY) doesn't have any values on the stack.

45 Interrupt 3F Function E7 (WIDTH) doesn't have 2 values on the stack.

46 Interrupt 3F Function 5B & 15 (WIDTH) doesn't have 2 values on the stack.

47 & 48 Interrupt 3F Function F9 (Some kind of stack manipulation) doesn't have enough values on the stack.

KEY, PLAY, STRIG or COM function without anything on the stack.

80 Assembler instruction D1 (Multiply by 2) in unknown context.

81 Assembler instruction C7 with subfunction that has no known context.

82 Assembler instruction 8B with subfunction that has no known context.

83 Assembler instruction 81 (Moves and Compares) with subfunction that has no known context.

84 Assembler instruction 83 with subfunction that has no known context.

85 Assembler instruction F7 (Integer Multiply/Divide or Negate) with subfunction that has no known context.

86 Assembler instruction 2B with subfunction that has no known context.

87 Assembler instruction 03 with subfunction that has no known context.

88 Assembler instruction 23 with subfunction that has no known context.

89 Assembler instruction 0B with subfunction that has no known context.

93 Interrupt 3F Function 4F (POS) doesn't have any values on the stack.

96 Assembler instruction 29 (Add to memory) with subfunction that is not known to constitute any Basic statement. This will appear within the Basic statement as a variable.

98 Interrupt 3F Function CB (MID$) doesn't have enough values on the stack.

99 Interrupt 3F Function D2 D3, D7 & D6 (STR$) doesn't have enough values on the stack.

100 Interrupt 3F Function F4 (END SUB) cannot be matched with a SUBROUTINE Statement.

101 Interrupts 36, 38 or 34 to perform arithmetic operations have been encountered when there are no values on the stack.

102 Interrupts 35, 37, 39 or 3B with subfunction 1C, 1D or 1F to get a value off the stack is not followed by a wait (Interrupt 3D).

110 Interrupt 3F Function CC (VIEW) doesn't have 3 values on the stack.

111 Interrupt 3F Function 6A & E0 (LCASE$ & UCASE$) doesn't have any values on the stack.

112 Used twice: An "ON xxxx GOSUB yyyy" statement without 2 entries on the stack. Interrupt 3F Function 40 (INPUT$) doesn't have enough values on the stack.

110 Interrupt 3F Function 82 (ON PEN GOSUB) doesn't have enough values on the stack.

120 Interrupt 3F Function FF Subfunction 09 (Compare strings) without 2 items on the stack.

121 Interrupt 3F Function FF Subfunction 80 (Set condition codes for compare) with less than 2 items on stack.

129 Interrupt 3F Function FF Subfunction A0 A1 & A2 (Print Currency) with less than 4 items on stack.

130 Interrupt 3F Function E4 (VIEW) doesn't have enough values on the stack.

132 Interrupt 3F Function E9 (WINDOW) doesn't have enough values on the stack.

133 Interrupt 3F Function 88 (PAINT) doesn't have enough values on the stack.

134 Basic function that requires 2 items on the stack but they are not there.

140 Interrupt 3F Function 9F (PAINT) doesn't have enough values on the stack.

142 Interrupt 3F Function 61 (PUT graphics) doesn't have enough values on the stack.

143 Interrupt 3F Function 41 (FIELD) doesn't have enough values on the stack.

144 This is used twice Interrupt 3F Function 73 & bb (LSET & RSET) doesn't have 3 values on the stack. Interrupt 3F Function 3E (Part of the FIELD statement) doesn't have any values on the stack.

145 Interrupt 3F Function 5D & AA (GET & PUT) doesn't have 3 values on the stack.

678 Interrupt 3E without valid subfunction. (Version 1 of Microsoft Basic Compiler)

227 Assembler instruction 1B 0r 13 ( ADC ) in an unknown context.

228 Assembler instruction A5 (MOVSW) in an unknown context.

331 Interrupt 3F Function FF Subfunction 8E (Store Currency) with no items on stack.

329 Interrupt 3F Function FF Subfunction 8B &8A (Currency calculations) with no items on stack.

331 Interrupt 3F Function FF Subfunction 7F 86 & 89 (Currency calculations) with less than 2 items on stack.

395 See 396 below but this is only for assembler code 01 BE

396 Assembler instruction 01 (Add to memory) with subfunction that has no known context. This will appear within the Basic statement as a variable.

401 Interrupts 35, 37, 39 or 3B with subfunction 1C to get a value off the stack when there is no value on the stack.

402 Interrupts 35, 37, 39 or 3B with subfunction 1D to get a value off the stack when there is no value on the stack.

403 Interrupts 35, 37, 39 or 3B with subfunction 1E to get a value off the stack when there is no value on the stack.

404 Interrupts 35, 37, 39 or 3B with subfunction 9E to get a value off the stack when there is no value on the stack.

405 Interrupts 35, 37, 39 or 3B with subfunction 5E to get a value off the stack when there is no value on the stack.

410 Assembler instruction 33 (XOR - usually zeroise) with subfunction that has no known context.

420 Interrupt 3D (Version 1 of Microsoft Basic Compiler) without recognised subfunction.

430 Assembler instruction 87 appears in an unknown context.

437 Assembler code 87 04 (usually SWAP) in unrecognised context.

532 Interrupt 3F Function 09 (CLOSE) doesn't have 4 values on the stack.

593 Assembler instruction 2B E1 appears in an unknown context. The only known

547 Interrupt 3F Function E2 (VARPTR$) doesn't have 2 values on the stack. context for this (SUB SP,CX) is to reserve space on the stack for a variable - possibly a string.

622 Interrupt 3F Function A2 (Digerst data for PAINT) doesn't have 4 values on the stack.

626 & 627 Interrupt 3F Function FF Various arithmetic functions without enough items on the stack.

628 Interrupt 3F Function FF Subfunction 4B & 4Cs without 2 items on the stack.

629 This error code has been used twice: Interrupt 3F Function 6E (LINE INPUT) doesn't have 4 values on the stack. Interrupt 3F Function FF Subfunction AC (CREATEINDEX) with no items on stack.

630 Interrupt 3F Function FF Subfunction AC (CREATEINDEX) without enough items on stack.

660 Interrupt B0 or B1 (Add Stack to stack) without 2 values on stack.

661 Unfortunately I've duplicated this error number:

- Interrupt C0 or C1 (Add Stack to stack) without 2 values on stack.

- Interrupt 3E subfunction 74 VIEW (Version 1 of Microsoft Basic Compiler) with less than 4 items on stack.

645 This error code has been used twice:

Interrupt 3F Function A8 & A2 (PSET) Not enough values on the stack.

Interrupt 3F Function 5C 5E & A &9 AB (GET & PUT) doesn't have 3 values on the stack.

669 Interrupt 3E subfunction 8B PSET (Version 1 of Microsoft Basic Compiler) with less than 4 items on stack.

670 A string is too long in a concatenation.

671 & 672 Interrupt 3E subfunction 48, 49 or 70 PAINT (Version 1 of Microsoft Basic Compiler) with less than 4 items on stack.

673 Interrupt 3E subfunction 58 PUT (Version 1 of Microsoft Basic Compiler) with less than 4 items on stack.

678 This is used twice:

Interrupt 3F Function 9D (POINT) doesn't have 4 values on the stack.

Interrupt 3E subfunction 3A GET (Version 1 of Microsoft Basic Compiler) with less than 6 items on stack.

679 Interrupt 3E subfunction 8A PRESET (Version 1 of Microsoft Basic Compiler) with less than 6 items on stack.

683 Interrupt 3F Function 7E (NAME) doesn't have 2 values on the stack.

684 Interrupt 3F Function CA (IOCTL) doesn't have 2 values on the stack.

686 Interrupt 3F Function DA (SWAP) doesn't have 2 values on the stack.

687 Interrupt 3F Function DD (SWAP) doesn't have 4 values on the stack.

701 Too many subroutines were found. The only problem here is that the end of any functions past this point will need doctoring!

721 Interrupt FE without recognised subfunction.

732 Interrupt 3F Function CF (SEEK) doesn't have 2 values on the stack.

733 Interrupt 3F Function 03 (BSAVE) doesn't have 2 values on the stack.

734 Interrupt 3F Function 00 (Move string) doesn't have 2 values on the stack.

735 Interrupt 3F Function 02 (BLOAD) doesn't have 2 values on the stack.

737 Used twice: Interrupt 3F Function E5 (VIEW PRINT) doesn't have 2 values on the stack. Interrupt 3F Function 6F (LOCK & UNLOCK) doesn't have 2 values on the stack.

739 This error number has been used thrice:

Interrupt FE subfunction C3 (Concatenate) without 3 items on the stack.

Interrupt 3F Function 8A & 8B (PALETTE) doesn't have enough values on the stack.

Interrupt 3F Function 8E (PCOPY) doesn't have enough values on the stack.

745 This error number is used twice:

Interrupt 3F Function 5F & AC (GET & PUT) doesn't have 3 values on the stack.

Interrupt 3F Function D5 (TIME$ =) doesn't have 3 values on the stack.

746 Interrupt 3F Function A1 (PRESET) doesn't have 2 values on the stack.

747 Interrupt 3F Function AF (REDIM) doesn't have 5 values on the stack.

748 Interrupt3F Function FF Subfunction C9 (REDIM PRESERVE) without 5 items on stack.

762 Interrupt FE without recognised subfunction.

750 Assembler code DF, DB, DD or D9 with subcode that has no known context.

751 Assembler code DA with subfunctions C1, C9 or F9 (arithmetic operations requiring 2 values on the stack) but there are not 2 values on the stack.

753 Assembler code 8B 54, 55 or 57 in unrecognised context.

757 Assembler code 8B 5C in unrecognised context.

762 Interrupt FE subfunction CE (Empty String) without anything on the stack.

765 Interrupt FF subfunction 53 (SCREEN) without anything on the stack.

774 Maths CoProcessor statement (D8 or DC) requires something on stack. It is not there!

775 Assembler code DA with subfunctions 36, 06, 26 or 0E (arithmetic operations using stack) but nothing is on the stack.

777 Interrupts 35, 37, 39 or 3B with subfunction 9F to get a value off the stack into an array element (BX) is not followed by a wait (Interrupt 3D).

779 Hex 21 not followed by hex d2 (I.E. 21 D2 is AND DX,DX - used for testing DX=0).

780 Same as 779 but for 22 C2 (AND AL,CL)

784 Assembler instruction xx 1E where xx = DF, DB, DD or D9 requires something on stack. It is not there!

786 Same as 784 but for xx 1F

820 Interrupt 3F Function FF Subfunction7 08 (CLEAR) with no items on stack.

827 Interrupt 3F Function FF Subfunction7 0B (RUN) with no items on stack.

829 A RUN function without anything on the stack.

837 Interrupt 3F Function FF Subfunction A3 (Input Currency) without enough items on stack.

851 Assembler instruction 8D with subfunction that has no known context.

863 Assembler instruction 8C with subfunction that has no known context.

910 Interrupt 3A subfunction 36, 26 or 06 arithmetic operation on value on stack but there is nothing on the stack.

912 Interrupt 3F Function FF Subfunction 94 (ON UEVENT GOTO) with no items on stack.

918 Interrupt3F Function FF Subfunction AD (MOVE) without any items on stack.

921 Interrupt 3F Function EE (Sets condition codes for compare) doesn't have 4 values on the stack.

922 Interrupt3F Function FF Subfunction AE (OPEN FOR ISAM) without 5 items on stack.

928 Interrupt3F Function FF Subfunction AF (SEEK) without 3 items on stack.

932 Interrupt 3F Function 68 (KEY) doesn't have 2 values on the stack.

999 Too many items on the stack for the decompiler too cope. Unpredictable results.

1000 Internal problem with Add-on.

1500 Cannot recognise an Add-on.

1501 Add-on without enough on the stack.

1502 Add-on without enough on the stack.

1503 Financial Add-on without enough on the stack.

1504 CALL Communications Add-on without enough on the stack.

1505 Add-on without enough on the stack.

1506 Unrecognised Add-on function. This may be caused by bad parameters - (See diagnostics) particularly When the "Run time Module" option is either selected or assumed.

CONTENTS

GENERALLY HOW IT OPERATES.

GETTING STARTED

USING THE MAIN WINDOW.

Files-Save-Parameters.

DataFields-Scan

DataFields-Investigate

Decompile-Go

Decompile-Compare

decompile-view

Physical Address Of Start Of Data Field.

Logical Address Of Start Of Data Field.

Adjust Values scrollbar.

Long String Descriptors checkbox

Don't Remove EOF Characters.

Then you get the following 4 variables:

THE HEXADECIMAL INVESTIGATOR

EXAMPLE

VIEW THE DECOMPILATION

HEX VIEW THE DECOMPILATION

COMPARISON WINDOW

DIAGNOSTICS

1 - If this contains lots of "error 1506" messages then:

2 - If it contains "Unrecognised Ver 1 function" messages

3 - If it contains lots of GOTO ????'???? or GOSUB ????'????

4 - If all the character strings are rubbish, contain strange characters or are truncated.

5 - If there is hardly anything (except perhaps END XX) in the normal View Window then:

6 - If it contains a lot of "Unrecognised functions"

USING DEBUG

GLOSSARY OF TERMS

Bits

Bytes

Character Strings

Data Field

Hexadecimals

Logical Address

Logical start of data..

Physical Address

Physical start of data

String descriptors

Strings

INTERPRETING THE DECOMPILED TEXT

ERROR MESSAGE NUMBERS