Assembly Language Using MASM Basics

INTRODUCTION

Intel 8086 is one of the very successful microprocessors which have been there from 1978 onwards. It is upward compatible with the advanced Intel processors based on the IA-32 Architecture, and is the processor which every beginner of microprocessor studies invariably goes through. It has a fairly complex and quite powerful instruction set. A good understanding of the hardware register capabilities as well as of the instruction set is needed to program and get the best from the processor.

Students of microprocessor courses find programming in the assembly language rather difficult. But programming in the assembly language is important as it gives one a very clear picture of the internals of the processor. Assembly language programming is perhaps the best way to study the different features provided in the processor. In certain situations assembly language programming is the best way of using the processor. It can be efficient in use of memory and in execution time as well. As such, it may be ideal for handling processors in embedded systems. The difficulty most people have in assembly language programming is mainly due to the two views to be understood while writing the assembly language program. For one thing, the programmer must view the hardware available clearly in terms of registers and their capabilities. Secondly the programmer must not lose the algorithmic perspective of the job being done. Managing both these simultaneously is where the difficulty lies. In the higher level languages one is free to concentrate on the algorithm only and the compiler will handle the hardware register details etc. without bothering the programmer. However, if the assembly language programs are properly commented with reference to the algorithm used, it should not be too difficult to write an assembly language program, and to understand the logic of the program at any time later, without ambiguity.

In the present book, sufficient care is taken to make the background required for programming very clear in the first two chapters. In the rest of the book, adequate examples are discussed to nail down the various finer aspects of programming. In writing an assembly language program, normally, text books give a working program and that is that; no alternatives are discussed. Discussions on the logic of selecting a particular algorithm and allotting registers to the variables to be handled are not seriously done. In this book quite a lot of insight into proper register selection and proper algorithm selection are all discussed adequately. In many instances more than one program is presented for solving a given problem.

It is to be noted that we have used only the simplest programming mode in the MASM. Use of helps like tiny, small medium and large models is not done, as the focus here is more on remaining as close as possible to the processor hardware and not so much on the study of the advanced features of the tool, namely, of the MASM.

Another important and unique feature of the book is that the programs given here are fully tested using MASM version 5.10 for assembling; the debug environment is used for testing. The working of the program in the debug is adequately illustrated with

i

adequate study documents produced from the debug during its operation. One can almost get the hands on experience while going through these documents.

The book presents programs at various levels of difficulty, from simple to complex. The stress is however, on number crunching type of programs, although some basic I/O programs like keyboard handling and simple screen displays are included. Again, it is also to be noted that the concentration throughout the book is on assembly language programming at the hardware level. This means no serious note is made on MASM feature based programming like use of macro libraries or modular programming etc. which are not functions of the processor hardware. In fact, at the end of chapter 6, the reader should be in a position to create his/her own library of macros useful in programs handling large numbers. The individual macros are discussed at length as they serve to introduce a concept of generating our own instructions to add to the instruction set of the processor. The entire book is organized as follows:

In chapter 1, a basic introduction to the assembly language programming is presented along with a study of the tools used for the study. The Appendix 1.A and 1.B elaborate on certain aspects raised in the chapter. Appendix 1.A refers to the 8085 processor also to bring out the difference between the ALU of the 8085 and 8086 processors in respect of DAA and DAS operations. Appendix 1.B uses the subroutine which is described later in the chapter 4, and, if necessary, this study may be deferred till that chapter is gone through.

Thorough descriptions of the register set of 8086 as well as of the instruction set of 8086 are presented in Chapter 2. They have to be fully understood before proceeding to program in the assembly language. Like a person trying to play chess must be thorough with the rules of the game before he starts to play, an assembly language programmer must be completely confident of the register set and register capabilities along with the instruction set for the processor before beginning to write programs. Examples are given in this chapter for studying single instructions using the debug all by itself.

In Chapter 3, we see the basics of programming. Generally beginners study programming by studying specific programs for doing specific jobs. Almost invariably they end up with the idea – one job, one program. In the Chapter 3, it has been shown that programming is much more individualistic, almost to the level – one person, one program. Just like different persons may describe a given situation in different ways, so can different persons write different programs to do a given job. In this sense programming is like an art, capable of variations to suit the taste of the programmer. Illustrating this effectively is the objective of the Chapter.

Chapter 4 discusses the use of macros and subroutines, which are quite useful in a programming environment. Although they both serve almost similar purposes, namely reducing repetitions from the point of view of the programmer, and freezing specific type of useful tasks to reusable tested program units, they have their own differences and these differences are clearly brought out and illustrated in this chapter.

ii

Chapter 5 is devoted to simple example programs which help the beginners in understanding various aspects of developing working programs. In this process, one should observe that if a program works for the first time in the lab, it will not be a good learning material. Only when one goes through and rectifies errors that are inevitable with any program, will the learning be complete.

Chapter 6 illustrates the power of the Intel 8086 processor in number crunching. Very large numbers are handled in this Chapter including large BCD numbers. However, as a beginner’s learning material, these are not suitable. These are there to show the capabilities, and to motivate the believers (one need not necessarily go into the details, but if one believes the details can be worked out with enough patience and is able to see the results and also verify them.) into getting enthusiastic about assembly language programming.

The author is grateful to the Nitte Education Trust and the Principal and the staff of the NMAM Institute of Technology, for providing an encouraging atmosphere where the author can peacefully pursue his interest. The staff and students of the National Institute of Technology, Karnataka where the author worked earlier, and where the author was introduced to the 8086 Processor are gratefully acknowledged for motivating the author to study the intricacies of assembly language programming. I cannot, of course, miss to mention the constant support from my family in all my endeavors. I believe the book will be useful to the staff and students to understand the basics of assembly language programming.

K M Hebbar Copyright © 2008 K M Hebbar

iii

1. ASSEMBLY LANGUAGE PROGRAMMING

Use of microprocessors in embedded systems catering to some special equipment or needs, as well as in general purpose personal computing systems is continuously increasing. All these microprocessors need to have a lot of system and other application software programmed into them before they can be used. Embedded systems are programmed once for all during manufacture, while personal computers are supplied with system programs initially and may be programmed also during use by the user. This programming requirement can be met at different levels. The programming may be done at the machine language level, at the assembly language level or in any one of the high level languages (HLL’s) like C, Java etc, with progressively increasing ease. Machine language programming is quite difficult, for it requires not only an intelligent adaptation of the processor hardware facilities and the instruction capabilities in solving the problem at hand, but also a clear ability to handle the commands coded in terms of numbers. It is this difficulty of using a number based command language that is overcome when we use the assembly language. In the assembly language program (ALP), we use command words rather than command numbers. For example, the command binary number 01 in the machine language of the Intel 8085 processor implies the MOVE command; the number 001 in the register field in this move command implies the register C of the processor, while the number 000 represents the register B. Thus to cause the data in register B to be moved to register C, we use the machine language command binary number: 01 001 000. The assembly language version of this command is: MOV C, B. It is easily seen that the command in the ALP is much more convenient for us to handle as compared to the machine language strings of 1’s and 0’s. The processor can then be used to map the character and word symbols of the ALP to the number symbols of the machine language (ML). Assembly language thus removes one of the difficulties for the programmer, for, words of English language or word-like character combinations make it easier to remember and use the commands as compared to the number symbols for commands.

The programmer using the AL (assembly language) must still have a complete knowledge of the register set of the processor and their capabilities, before he can write an efficient program. Writing an ALP to solve a problem requires thinking in two distinct levels, one at the processor hardware level, and another at the problem level. Let us take the simple operation of multiplying two variables. At the problem or algorithmic level, all that is to be done is to multiply two numbers. At the processor or hardware level, one has to worry about where the variables are to be placed: if they are to be in the processor registers or in the memory, and where the result is to be put. The HLL’s differ from AL in this aspect. When using HLL, one need not be concerned about where, in the hardware, the variables are to be located. One can directly program in terms of the variables and the operations to be done on them, that is, think only in terms of the algorithm and not about implementation details in terms of the processor system hardware. The conversion of HLL to ML to use the hardware facilities available in a given processor is done by the HLL compiler, specific to the target processor. It is to be noted that different compilers may have different levels of program optimization. However, a specific problem may have special features and a general purpose

1

optimization of an optimsing compiler may not be able to fully exploit these special features in its optimization. An efficient human programmer may be better in exploiting such special features. Of course, it requires more effort on the part of the programmer, but once this effort is put, the resulting gain in the speed of execution of the problem is available every time the program is used. An ALP can have this advantage over a compiled program. The raw power of the processor can be best handled only at the AL level and not so much by the HLL level. Further, when using a HLL, the programmer is bound by the compiler in respect of the data types he may use. For example, very very long integers cannot be used. For further discussions on this, you may refer to the website http ://webster.cs.ucr.edu/Articles/GreatDebate/index.html

An assembly language may have one additional advantage. Normally, people consider the processor as a black box, but assembly language can give an insight to the working of the processor, making it possible to have some peep into the subunits of the processor, giving at least a grey box view (if not the complete white box view), of the processor. Consider the following very unconventional and almost meaningless looking assembly language program for an 8085 processor:

CPI 0A ; Compare register A with hex number 0ASBB 2F ; subtract immediate with borrow, the hex number 2F from register ADAA ; adjust register A for decimal addition! Note DAA is restricted to be used

; only after addition, here we are using after subtraction.

This 3-line program converts the single hex digit in register A to its ASCII equivalent. A corresponding program given below in 8086 will not do this conversion.

CMP AL, 0A HSBB AL, 2F HDAA

If we try to investigate the reason for this difference, we will discover quite a lot about how the ALU of the two processors differ in their design. (See appendix A for a brief discussion on these programs).

To study the hardware details to any extent, we should be as close to the hardware as possible. The closest one can get without serious involvement with the machine language is through the assembly language. The only other language which can be considered in this context is the C language, which has some features of the assembly language. But the assembly language gives the best possible approach to develop an insight into the hardware of the processors.

Assembly language, however, requires thinking in two levels as we have already noted, at the algorithmic level, and at the hardware level. People feel it is a serious difficulty to continuously think at two levels. However, it can be looked at positively, in that it will improve the mental faculty of concentration and focusing. When an ALP is written, the instructions themselves show what happens at the hardware level. But how

2

does it relate to the problem or the algorithm at hand is not very clear. After a couple of weeks, or even days, the programmer himself may not be able, perhaps, to understand why a certain instruction is present or what it does in the program. To overcome this difficulty, writing proper comments is a necessity. The instructions will clearly give the hardware action, but the algorithmic basis for doing that hardware action is what the comment should say. Comment for comment’s sake, as in the example shown below, does not say anything beyond what the mnemonic says and should be strictly avoided:

MOV BX, AX ; move the data from reg. AX, to reg. BX Note the comments are to be separated from the instructions by a semi colon, “;”. In

any line, the assembler will ignore whatever that comes after the semi colon. If the above instruction has to be commented, the comment, depending on the algorithmic context, can be something like:

MOV BX, AX ; save AX in BX for later use. Writing the instructions with accompanying relevant comments at the problem

level will further make the process of thinking at two levels easier than otherwise.

Tools used for the ALP Studies: The basic tools required for studying the 8086 processor at the ALP level, are (i) a macro assembler, MASM, for example, (ii) the associated linker, and (iii) a debugger, DEBUG, for example. We shall be using the MASM (version 5.10), which can assemble files with the .asm extension, to produce a .obj file (and .lst file also, if required), the associated linker which produces a .exe file from the .obj file(s). The .exe file can be studied in the DEBUG. As we show later, simple studies (like single instruction studies, for example), which do not need more than a few instructions can directly be done in the DEBUG itself. We shall look into these tools one by one.

The DEBUG: The Debug is a low level facility which allows programs to be assembled as well as executed either step-by-step, tracing the entire register contents on execution of each instruction or up to a specified break point. The trace facility also indicates the next instruction to be executed, along with any relevant memory data associated with the execution of the next instruction. Execution to a break point is also permitted, in which case, the register contents etc. will be displayed after the execution of the final instruction before the break. In case of a subroutine the trace through the subroutine can be suppressed, and the result of the execution of the subroutine can be seen at the return from the subroutine. Below, is shown, the format of the trace display in the debug

-tAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 8B4140 MOV AX,[BX+DI+40] DS:0040=0050-

Color coding is done above for easy identification of the different fields of the trace display, and the different trace fields are described below:

-t The trace command; note the “-” here is the debug prompt; this prompt is also seen after the trace operation is completed in the next line (in the 5th

line of the display above), prompting for fresh command to be issued.

3

AX=0000: The thirteen registers, excluding the Flag registers are displayed and their contents after the execution of the previous instruction are indicated in hex. NV: The eight flag conditions are indicated explicitly as existing after the execution of the previous instruction, as follows:Overflow flag: NV- No oVerflow, OV- OVerflowDirection flag: UP- address increasing (string instructions) DN- address decreasingInterrupt flag: EI- Enable Interrupt; DI- Disable InterruptSign flag: PL- positive or PLus; NG- NeGativeZero flag: ZR- ZeRo; NZ- Not ZeroAuxiliary carry: AC- Auxiliary Carry present; NA- No Auxiliary carry presentParity: PE- indicates Even Parity; PO- indicates Parity OddCarry Flag: CY- indicates CarrY present; NC- indicates No Carry present 1377:0100 8B4140 Shows the next instruction address and the next instruction machine language coding. MOV AX,[BX+DI+40] Next instruction in the assembly language, ready for execution if a ‘t’ command is to be given next.DS:0040=0050 The relevant word data (at DS:[BX+DI+40] which is DS:[40] here) indicated as 50 at that location in hex.

General features of the debug: The debug prompt is the ‘–’ sign, as we have already seen. All commands of debug are single letter commands. The commands may have one or two parameters normally (sometimes a list of numbers), to represent address, data or register names. The parameters are just given as hex numbers or register names following the command letter with a space to separate the two, if there are two parameters. The commands of debug are not case sensitive. ‘A’ or ‘a’ will carry out the same command in the debug. A200, A 200, a200 or a 200 are all the same in the debug. Similarly u200 210 or U 200 210 are also the same; at least one blank space separating two parameters of the command is obviously a must, but space/s separating the command character and the first parameter is optional. Only 16 bit register names may be used as parameters in the command. For example, ‘ax’ can be used with a register command, but not ‘ah’ or ‘al’; ‘rax’ or ‘r ax’ is a valid command, but not ‘ral’ or ‘r ah’. We shall now look into some of the commands.

Table 1. Some Debug Commands

command ch parametersassemble a [address]

dump d [address range]enter e address [list]go g [= address] [list of alternative addresses]

proceed p [=address] [number]quit debug q none

register r [register]trace t [= address] [value]

unassemble u [range]Help ? none

4

Note: 1. All commands are single characters as shown in column 2 of the table.

The third column shows the parameters. Optional parameters are shown within square brackets.

2. When optional parameters are not given, a default value based on the current conditions will be taken.

3. Full details of the commands can be had in debug using the command ‘?’.Several examples in the next chapter will clarify the use of these commands. An

example of the study of the g command is shown here. In this exercise you can see the method of using not only the g command but several other commands as well.

-a ; assemble at the default address (no parameter given)1377:0100 mov ax, 12341377:0103 ja 10a ; jump if no carry, to 10a hex location 1377:0105 jb 110 ; jmp if carry, to 110 1377:0107 jz 120 ; jmp if zero flag is set, to 1201377:0109 ; simply press ‘enter’ to exit from ‘assemble’ command -r ; display registersAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 B83412 MOV AX,1234 -g =103 10a 110 120 ; start from 103 and halt at 10a, 110 or 120AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=010A NV UP EI PL NZ NA PO NC 1377:010A 03DB ADD BX,BX; note, instruction at 100 is not ; ; executed, execution is from 103 only -rip ; show reg ip contents and alter as indicated.IP 010A ; ip shown as is:100 ; alter it to 100 -rfNV UP EI PL NZ NA PO NC -cy ; set carry flag-rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO CY 1377:0100 B83412 MOV AX,1234 -g = 103 10a 110 120AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0110 NV UP EI PL NZ NA PO CY 1377:0110 BB2000 MOV BX,0020 -ripIP 0110:100 -rfNV UP EI PL NZ NA PO CY -zr nc ; set zero and clear carry.

5

-rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL ZR NA PO NC 1377:0100 B83412 MOV AX,1234 -g =103 10a 110 120AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0120 NV UP EI PL ZR NA PO NC 1377:0120 0000 ADD [BX+SI],AL DS:0000=CD ; the data at DS:[BX+SI] is shown-q ; quit debug

Also see Appendix B for a study of the proceed command of the debug.

The Macro Assembler MASM, and the Linker, LINK: The assembly language program is written using the edit command in the DOS environment, with a filename with the file extension .asm. It can then be assembled using the command MASM filename; for example, to assemble the file hex_to_bcd.asm, the command will be: Masm hex_to_bcd;The ; at the end will make the assembler ask no further questions about the files to be generated. It will only generate the object file, a machine language file with a file extension .obj; for the above command shown in italics, the file generated (considering there are no errors in the ALP) will be: Hex_to_bcd.objThe object file will have machine codes for the ALP, but the segment will not be initialized. This means, the file will be re locatable, depending on how the segments are specified. Every effective address in the program or for data is relative to one of the segment (cs:, ds:, es: or ss)base address values. The program will be completely executable with all segments initialized, and if it is a multi module program, with all the modules properly linked up, by using the link command. For a single module program, as the one indicated above, the link command is: link hex_to_bcd; The result of the link operation will produce an executable hex_to_bcd.exe file, if everything is OK. This executable file can directly be worked or studied step-by-step or using break points and so on, in the debug environment. It can also be directly executed to the finish as a command under DOS.

The Assembler Directives: The following skeleton of an ALP shows the main features of a simple .asm file, indicating many of the assembler directives used.

1 data segment2 val_hex dw 1234h, 567h, 0abch3 val_dec dw 3 dup (?)4 data ends5 stak segment stack6 dw 256 dup(?)7 tos label word8 stak ends9 code segment10 assume cs:code, ds:data, es:data, ss: stak

6

11 start: mov ax, dataa. mov ds, axb. mov es, axc. mov ax, stakd. mov ss, axe. mov sp, offset tosf. ;program instructions

12 code ends13 end start

Color coding: highlights: segment named data; segment named stak; and segment named code. End of compilation indication to the assemblerCharacter colours: black (followed by red) symbol/variable namesRed: Assembler Directives; Blue: Data or processor hardware related information We shall now look at the above skeleton program line-by-line.Line 1: The assembler directive used is the segment. This makes the assembler

open up a new segment. The word data is the name given to the segment. The format for the segment opening is name_of_segment followed by the word segment.

Line2: val_hex: name of the first variable stored.dw: is the assembler directive, define_word, signifying the variable val_hex and the rest of the data on that line are word or 16-bit datadb or dd in that position will indicate define_byte and define_double word (32-bits) respectively.The blue numbers in the line initialize the 3 variables in the beginning of the segment named ‘data’ to the values given. Values can be given in binary (b), decimal (default) hexadecimal (h)

Line 3: val_dec: name of the variable stored after the 3 words of line 2dw: assembler directive define word3 dup (?) : this indicates 3 items (here 3 words) of data are provided here, without being initialized. The directive means 3 duplicate (any data word). Such un-initialized locations are kept for storing the results of the program. If required, these locations can all be loaded with any data like all 0s by simply changing this part to 3 dup (0).

Line 4: data: The variable (segment name), as we have seen already.ends: the assembler directive to end (or close) the segment (data segment in this case).

Line 5: stak: a variable or label name.segment: segment directive, indicating open a new segment as we have already seen (and name it as stak) stack: consider it as the stack segment. This is mainly information to the user of the program, like a comment. The assembler essentially does not do anything about it.

Line 6: dw: define word directive, which we have already seen.256 dup (?): un-initialized words 256 (= 100hex). This is the memory provided for the stack in this program.

Line 7: tos: variable or label namelabel: directive to consider tos as a label name.

7

word: this indicates to the assembler that the label tos is a word pointer.Line 8: Stak ends: marks the end of the segment named stak.Line 9: code segment: indicates the start of a new segment named code.Line 10: assume: this is an assembler directive. The program is written in the code

segment and to start the program, the first instruction is to be fetched from the location pointed by cs:ip. This requires both cs and the ip must be available in the beginning. Defining the cs segment is taken care of by this assembler directive. At the time of linking, this will be taken note of. Defining ip we shall see in the next line. The directive assume indicates what are the segments used by the program and what are their names. Note, cs cannot be managed by the program, because the program itself cannot start without a cs being defined. The other segment registers can be loaded in the program itself and hence, even though indicated in the assume directive, are indicated only for user reference, and are to be specifically managed in the program. They are simply like comments. Only cs is of significance to the assembler, linker. Some assemblers do take care of loading the other segments also using slightly different type of directives. cs:code: indicates the assembler to use the segment named code as the cs.ds:data, es:data, ss:stak: information for the user as already stated above. Some more ideas on this can be had by looking at the pr.asm program discussed in appendix B at the end of this Chapter.

Line 11: start: this is a label used for reference purposes.mov ax, data: the first instruction of the program. This instruction is for loading the different segment registers. The segment name data is to be loaded into ds and es segment registers. So this segment name is moved to register ax and from there to registers ds and es in the succeeding instructions.

Lines a. to e: These instructions take care of initializing segment registers ds, es, and ss, and also of the stack pointer sp.

Line f: Line f and onwards, the real useful operations of the program are written.Line 12: code ends: Line 12 tells the assembler that the segment named code has to

be ended using the ends or end segment directive.Line 13: end: The end directive tells the assembler that it is the end of the assembly.

start: is a reference to the start label, telling the assembler to load the effective address or the offset address of the start label to the ip, and start executing from that address.

The description above is, in brief, an introduction to the assembly language programming (ALP). We have, in this chapter studied the importance of assembly language programming, in the context of embedded systems, as well as in the context of writing efficient programs, and special programs. We have seen the basics of the debug and the use of MASM and LINK programs. With this knowledge of the tools used, we can now proceed in the next chapter to the study of instruction set and other details of the processor to arm ourselves fully to writing good programs at the ALP level.

8

The Segment Definitions and Segment Integrity Aspects: With the assembler directives segment and ends, it may look as though the assembler MASM will take care of maintaining the segment limits and prevent other segment operations over writing and damaging the integrity of any segment. Unfortunately, it is not so. The assembler simply converts the program as given, into a suitable machine language program. But if inside the program, there happen to be instructions that violate any segment integrity, either going beyond the segment defined areas or crossing over into regions defined for other segments, the assembler will not be able to check on this, as this happens at the run-time, and is not known at the assemble-time. The example shown below indicates this feature. This means the programmer should work out in advance the requirement (maximum requirement) for the data, extra and the stack segments and make adequate provisions for these requirements in his program. The hardware of the processor will not check these aspects during the running of the program. Later versions of the processor, 286 onwards have guarded against these eventualities in the protected mode of operation, by defining clearly the segment limits and providing hardware to prevent such infringements on the segment integrity and segment boundaries. The example demo of the 8086 unprotected system is given below:

8086 does not control the segment sizenor does it protect the integrity of the segments.

These are the programmer’s responsibilities.;the program to indicate 8086 has no control over the segment overflows, nor ;does it maintain segment integrity.data segment base dw 5 dup(3)data ends ; segment as defined has only 5 words in it with data 0003 ;code segmentassume cs: code; ds; data start: mov ax, data mov ds, ax lea bx, base mov cx, 9 back: mov ax, [bx] add ax,10 mov [bx], ax add bx,2 loop back ; this loop forces 9 words into the data segment int 01code endsend start;Testing in debug-u 0 180B46:0000 B8450B MOV AX,0B45 ; note cs = 0B46, is just (ds + 1) 0B46:0003 8ED8 MOV DS,AX 0B46:0005 8D1E0000 LEA BX,[0000] 0B46:0009 B90900 MOV CX,0009 ; this forces 9 iterations of loop.0B46:000C 8B07 MOV AX,[BX] 0B46:000E 050A00 ADD AX,000A 0B46:0011 8907 MOV [BX],AX 0B46:0013 83C302 ADD BX,+02 0B46:0016 E2F4 LOOP 000C

9

0B46:0018 CD01 INT 01 -g 0eAX=0003 BX=0000 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=0B45 ES=0B35 SS=0B45 CS=0B46 IP=000E NV UP EI PL NZ NA PO NC 0B46:000E 050A00 ADD AX,000A -d 0 1f0B45:0000 03 00 03 00 03 00 03 00-03 00 00 00 00 00 00 00 ................0B45:0010 B8 45 0B 8E D8 8D 1E 00-00 B9 09 00 8B 07 05 0A .E..............;defined data segment, extra space undefined, space defined as code segment.-g 18AX=45C2 BX=0012 CX=0000 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=0B45 ES=0B35 SS=0B45 CS=0B46 IP=0018 NV UP EI PL NZ NA PE NC 0B46:0018 CD01 INT 01 -d 0 1f0B45:0000 0D 00 0D 00 0D 00 0D 00-0D 00 0A 00 0A 00 0A 00 ................0B45:0010 C2 45 0B 8E D8 8D 1E 00-00 B9 09 00 8B 07 05 0A .E..............; data segment now has 9 words, last word over written on code segment; changing the program itself as the listing after execution shows below:-u 0 180B46:0000 C2450B RET 0B45 ; code segment over written0B46:0003 8ED8 MOV DS,AX 0B46:0005 8D1E0000 LEA BX,[0000] 0B46:0009 B90900 MOV CX,0009 0B46:000C 8B07 MOV AX,[BX] 0B46:000E 050A00 ADD AX,000A 0B46:0011 8907 MOV [BX],AX 0B46:0013 83C302 ADD BX,+02 0B46:0016 E2F4 LOOP 000C -q

The program above brings out one of the reasons for incorporating protection features in the processor. In the absence of protection, a user program may destroy itself during the execution. It is not difficult to see that one user’s program may also destroy another’s program. Advanced processors, including upgrades of 8086 starting from 80286 have the protection features included in the hardware design of the processor. .

EXERCISES

1. What are the advantages of programming in the assembly language?2. Why people find it difficult to write assembly language programs as compared to

writing in HLL? How could the difficulty be mitigated?3. What is the purpose of writing comments? Give an example of a wrong ALP

comment, and indicate how this comment can be corrected.4. Show how debug can be used to study the instructions: (i) ADD AX, BX.

(ii) MUL BX (iii) ASL CX, 1 (iv) XOR AX, AX5. How would you use MASM to obtain the list file also along with the object file?

==00==

10

APPENDIX 1.A

8085 Operations for the hex to ASCII conversion:The data in the register A is a single hex digit, which is 0, or 1, or 2, or…., or 0F hex.

Comparing this with 0A hex will divide it into two classes, 0 to 9 hex will set the carry flag, while 0A to 0F hex will have the carry flag reset. The second instruction will cause the number 0D0 hex, complement (that is, not function) of 2F hex, with the carry inverted to be added to the contents of register A. It is recalled that subtraction is done by causing addition of complement of the subtrahend (with the inverted carry input) to be added to the accumulator. This would imply, if the accumulator data was initially 0 to 9 hex, 0D0 hex will be added to the number, else 0D1 will be added based on the carry input to the second instruction. This produces the following results:

Group 1. If A register had 0 to 9 initially, it will now have 0D0 to 0D9, with auxiliary carry and the carry flags both reset.

Group 2. Else, if it had 0A to 0E hex, it will now have, 0DB to 0DF hex with both carry and auxiliary carry flags cleared

Group 3. Else, for the input 0F hex, it will now have 0E0 hex with auxiliary carry set and carry reset.

Now we should look at the operation of the DAA. As we know, DAA logic is based on the accumulator data and the carry and the auxiliary carry flags. For numbers in the group 1 above, DAA will add 60 hex to get numbers 30 to 39 in the accumulator. Numbers in group 2 above will have 66 added to them to get 41 to 45 hex. The number in 3 above will also have 66 added (because of auxiliary carry flag) and it will become 46 hex. It could be easily seen that the result is the conversion of hex 0 to 0F in the accumulator to ASCII characters ‘0’ to ‘F’

8086 operations of this program: Shown below is a demo of the 8086 program on the same lines, along with results of execution of the program, step by step

CASE 1: LISTING 13D5:0000 B008 MOV AL,08 13D5:0002 3C0A CMP AL,0A 13D5:0004 1C2F SBB AL,2F 13D5:0006 27 DAA

WORKING OF THE PROGRAM CASE 1:

11

13D5:0000 B008 MOV AL,08 AX=0008 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0002 NV UP EI PL NZ NA PO NC 13D5:0002 3C0A CMP AL,0A AX=0008 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0004 NV UP EI NG NZ AC PO CY 13D5:0004 1C2F SBB AL,2F AX=00D8 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI NG NZ AC PE CY 13D5:0006 27 DAA AX=003E BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ AC PO CYCASE 3: LISTING 13D5:0000 B00F MOV AL,0F

13D5:0002 3C0A CMP AL,0A 13D5:0004 1C2F SBB AL,2F 13D5:0006 27 DAA

WORKING OF THE PROGRAM CASE 3:13D5:0000 B00F MOV AL,0F AX=000F BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0002 NV UP EI PL NZ NA PO NC 13D5:0002 3C0A CMP AL,0A AX=000F BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0004 NV UP EI PL NZ NA PE NC 13D5:0004 1C2F SBB AL,2F AX=00E0 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI NG NZ NA PO CY 13D5:0006 27 DAA AX=0040 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ NA PO CY

Two of the 3 cases are studied here in these debug demonstration, case 1 of the entry being in the group 0 to 9 hex, and group 3 for the number 0F hex. In group 1, we get 3E instead of 38 (6 more), and in group 3 we get 40 (6 less), instead of 46. In group 2 we get, on SBB, the AL register having numbers DB to DF hex (this case is not shown above), and 66 will be added to give the correct ASCII code in this case. When the result goes wrong, the culprit is seen to be the auxiliary carry flag shown highlighted (yellow) in the response to the SBB instruction in the demo shown above.

The interesting feature to be noted here is that when performing subtraction using 2’s complement addition, the carry of this ADD operation needs to be complemented to get the real borrow of the subtraction at every bit, as can be easily verified. However, neither 8085 nor 8086 indicates carry at each bit stage. Only the carry at the half byte stage (auxiliary carry) and the final byte stage carry are used. In the 8085 processor, adjustment for decimal subtraction is not provided, while 8086 provides for this operation. Because of this, 8085 ALU (arithmetic logic unit) does not bother about correcting the auxiliary carry for subtraction, because as a rule, auxiliary carry is not used after subtraction in 8085. We have used it here, sort of illegally. 8086 keeps the auxiliary carry at the correct value, to accommodate the DAS operation. Due to this feature, we have 6 added to the correct value in group 1 numbers, and 6 less in the group

12

3 number. An equivalent program for the 8086 could work if we complement auxiliary carry after the SBB instruction to simulate the 8085 ALU behavior. However this would require 3 additional instructions:

LAHF ; load the lower byte of flag register to AH registerXOR AH, 04 ; this will complement auxiliary carry flagSAHF ; store the AH register as the lower byte of flag register.

This makes the program a little bigger. Using the ideas of this program a more efficient assembly language program can certainly be designed. A program more suitable for hex to ASCII conversion in 8086, based on these ideas can be:

CMP AL, 0A HCMCADC AL, 30 HDAA

The reader could easily make out how this program works. The program can conveniently be used as a macro (see chapter 4 for macros) to translate a hex digit in AL to its ASCII equivalent. The program avoids the conditional jump that would normally be used for this purpose of converting hex to ASCII as shown below, and conditional jump would require more time to process. The conventional hex to ASCII program is as follows:

CMP AL, 0A HJB DOWNADD AL, 7

DOWN: ADD AL, 30 HBut this program uses a conditional jump instruction which generally takes more time

for execution.

Exercises of this type can give a lot of insight into the design aspects of the processor sub-units.

APPENDIX 1.B

STUDY OF THE PROCEED COMMAND OF DEBUG

Note: the file is named pr.asm. To understand the segment relations, see the list file also (pr.lst file).

The pr.asm program studiedData_here segment asc db 16 dup(0) data ends code segment assume cs:code ; note other segments not indicated. It will ; make the program rather difficult to follow or debug. start: mov ax,data_here mov es, ax ; ‘data_here’ now becomes the extra segment, es. ; ds segment will be separate now.

mov di,offset asc ; offset in the ‘data’ (es) segment cld

13

mov cx,16 mov bl,0 back: mov al, bl call hasc stosb inc bl loop back int 1 hasc proc near cmp al,10 cmc adc al,30h daa ret hasc endp code ends end startThe pr.exe program in the debug environment-u 0 2013D6:0000 B8D513 MOV AX,13D5 13D6:0003 8EC0 MOV ES,AX 13D6:0005 BF0000 MOV DI,0000 13D6:0008 FC CLD 13D6:0009 B91000 MOV CX,0010 ;this is in hex (16 decimal = 10 hex). 13D6:000C B300 MOV BL,00 13D6:000E 8AC3 MOV AL,BL 13D6:0010 E80700 CALL 001A 13D6:0013 AA STOSB 13D6:0014 FEC3 INC BL 13D6:0016 E2F6 LOOP 000E 13D6:0018 CD01 INT 01 13D6:001A 3C0A CMP AL,0A ; procedure starts here 13D6:001C F5 CMC 13D6:001D 1430 ADC AL,30 13D6:001F 27 DAA 13D6:0020 C3 RET ;procedure ends

-g 10AX=1300 BX=0000 CX=0010 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0010 NV UP EI PL NZ NA PO NC 13D6:0010 E80700 CALL 001A ; note es ≠ ds and es = ‘data_here’ -p 4 ; execute p command 4 times serially.AX=1330 BX=0000 CX=0010 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0013 NV UP EI PL NZ NA PE NC 13D6:0013 AA STOSB ; p1 over AX=1330 BX=0000 CX=0010 DX=0000 SP=0000 BP=0000 SI=0000 DI=0001 DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0014 NV UP EI PL NZ NA PE NC 13D6:0014 FEC3 INC BL;p2 over AX=1330 BX=0001 CX=0010 DX=0000 SP=0000 BP=0000 SI=0000 DI=0001 DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0016 NV UP EI PL NZ NA PO NC 13D6:0016 E2F6 LOOP 000E ;p3 over, p4 will complete the loop AX=1346 BX=0010 CX=0000 DX=0000 SP=0000 BP=0000 SI=0000 DI=0010 DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0018 NV UP EI PL NZ AC PO NC 13D6:0018 CD01 INT 01 ; p4 over; INT 01 to be executed next -d es:0 f ; note the result is in es segment and not in ds13D5:0000 30 31 32 33 34 35 36 37-38 39 41 42 43 44 45 46 0123456789ABCDEF

14

;the display at the end of the above line, indicates the characters printed -qThe pr.lst fileMicrosoft (R) Macro Assembler Version 5.10 1/19/7 Page 1-1

0000 data_here segment 0000 0010[ asc db 16 dup(0)

00 ]

0010 data ends 0000 code segment

assume cs:code 0000 B8 ---- R start: mov ax,data_here 0003 8E C0 mov es, ax 0005 BF 0000 R mov di,offset asc 0008 FC cld 0009 B9 0010 mov cx,16 000C B3 00 mov bl,0 000E 8A C3 back: mov al, bl 0010 E8 001A R call hasc 0013 AA stosb 0014 FE C3 inc bl 0016 E2 F6 loop back 0018 CD 01 int 1 001A hasc proc near 001A 3C 0A cmp al,10 001C F5 cmc 001D 14 30 adc al,30h 001F 27 daa 0020 C3 ret 0021 hasc endp 0021 code ends

end startMicrosoft (R) Macro Assembler Version 5.10 1/19/7 Symbols-1

Segments and Groups: N a m e Length Align Combine ClassCODE . . . . . . . . . . . . . 0021 PARA NONEDATA_HERE . . . . . . . . . . 0010 PARA NONESymbols: N a m e Type Value AttrASC . . . . . . . . . . . . . . L BYTE 0000 DATA_HERE Length = 0010BACK . . . . . . . . . . . . . . L NEAR 000E CODEHASC . . . . . . . . . . . . . . N PROC 001A CODE Length = 0007 START . . . . . . . . . . . . . L NEAR 0000 CODE@CPU . . . . . . . . . . . . . . TEXT 0101h

15

@FILENAME . . . . . . . . . . . TEXT pr@VERSION . . . . . . . . . . . . TEXT 510

26 Source Lines 26 Total Lines 11 Symbols 47090 + 412122 Bytes symbol space free 0 Warning Errors 0 Severe Errors

Copyright © 2008 K M Hebbar

16

2. REGISTER SET AND INSTRUCTION SET OF 8086

In this chapter, we shall look at the register set of 8086, as accessible to the programmer, and then we shall have a detailed look at the instructions; some of the instructions do require a little bit of appreciation of the actual situation where the instructions become useful (this is a general feature of all CISC – complex instruction set computing – type of processors). Where required, such situations are examined with worked out examples.

Register set of 8086 accessible to programmers

1. General purpose: ax (16 bits) or ah:al (8 bits each) – accumulator bx (16 bits) or bh:bl (8 bits each) – base register cx (16 bits) or ch:cl (8 bits each) – counter and loop control

register and dx (16 bits) or dh:dl (8 bits each) – extended accumulator

and I/O address register2. Pointers and index registers: si (16 bits) – source index register di (16 bits) – destination index register bp (16 bits) – base pointer (for stack frame base) sp (16 bits) – stack pointer (pointer for stack top)3. Segment registers: cs (16 bits) – code segment base pointer ds (16 bits) – data segment base pointer es (16 bits) – extra segment base pointer ss (16 bits) – stack segment base pointer4. Other utility registers: ip (16 bits) – instruction pointer f – or status (16 bits) – flag or status register

Note that in 8086, the data of 16-bits are called words (words are used to represent data or address), and 8-bits are called bytes or half words (bytes are to represent data or ASCII characters) and 4-bits are called nibbles (nibbles can be used to represent BCD digits or HEX digits). Also note that the register names are not case sensitive, when using ALP. That means ax and AX will both indicate the same register in ALP. Also note that registers can be indicated in capital letters or in lower case in the assembly language programs.

Discussion on the use of registers: We will start with the general purpose registers. Although the registers ax, bx, cx and dx are called general purpose registers for handling data (either in 8 bits or in 16 bits), these have some special capabilities. The registers ax (16 bits), and al (8 bits) are used as accumulators, capable of doing certain specific operations. When the registers AX or AL are used this way, they are implied in the instruction without being specifically indicated. These registers act as one of the source operands as well as the destinations for the result of the instruction. For example:

MUL CX will mean the word in ax (implied, and not directly specified in the instruction) is to be multiplied by the word in cx (specified in the instruction) to get a

16

double word product and the result will go to implied registers dx (high word of product in the extended accumulator) and ax (low word of product in the accumulator).

There are many such instructions which use ax, and al as implied accumulators, as explained later while discussing the instructions in detail. In case of multiplication and division of word size data, register dx is used as the high word extension of the accumulator for the double word product in multiplication and for double word dividend in division. The registers ax/ al are used as the accumulator in string instructions like lodsb/w or stosb/w etc.

The register bx, is called the base register. As a 16 bit register, it can be used to store an address. In the instruction XLAT (translate), it is used as the implied offset address in the data segment, where the look up table for translation is located. The other register associated with the XLAT instruction is the register al; al stores the byte pointer in the look up table before execution of XLAT and after execution the data in the table goes to al. The instruction can be used to realize any random Boolean function with up to eight inputs and eight outputs.

The register cx is called the counter register. It is used as a counter in handling arrays and with string instructions when they are repeatedly executed, with the prefix ‘rep’. It is also used as a loop counter, while executing loops a number of times. The register cl is used as a counter to control the number of bits of shifts/ rotations in shift and rotate instructions.

The register dx is used as an extension to the accumulator as already mentioned. It has also an additional function of storing the I/O port address for indirect addressing of the ports. The port address can be up to 16 bits. When the port address does not exceed eight bits, direct addressing (with address given directly in the instruction itself) of the ports is possible, but when the port address is more than eight bits, addressing must be only through register dx indirectly.

The general purpose registers ax (ah, al), bx (bh, bl), cx (ch, cl) and dx (dh, dl) are used for 16 or 8-bit data handling and are capable of performing arithmetic and logic operations, shift and rotate operations on the data stored in them. In this sense, they are all general purpose data handling registers.

Pointers and Index registers: We shall now look at the next set of registers, which are five in number, and which are used for handling mainly addresses. They are all 16-bit registers which can store the 16-bit offset address in a segment. Two of them are index registers: si and di (source index and destination index); and the other three are pointer registers: bp, sp and ip (base pointer, stack pointer and instruction pointer).

The registers si and di normally carry addresses of data in the data segment; in case of string instructions, however, si refers to the source address in the data segment and di refers to the destination address in the extra segment. We shall later discuss the method addressing data using a segment with an offset address in registers.

17

The registers bp (normally) and sp (always) refer to address of data in the stack segment, while the ip refers always to the address of the instruction in the code segment.

The registers si, di, bp and sp can all handle 16-bit arithmetic and logic operations, like the registers, ax, bx, cx and dx. Although arithmetic addition and subtraction will be useful for handling addresses, it is difficult to see how multiply, divide and logical operations could be used for address handling. It simply means that these registers can also serve as data registers for 16-bit data handling, when they are not used for address handling. They have no arrangement for handling two separate 8-bit data, unlike al, ah of ax.

The register ip, is meant exclusively for pointing to the next instruction to be executed. As all instructions are in the segment cs, ip is always used with cs to generate the instruction address. Although ip can take part in 16-bit addition/ subtraction, using an instruction which does not appear to be doing this operation; ip cannot enter into any other arithmetic (like multiply) or logic (like ex-or) operations. In short, it cannot at all be used to handle data. Whatever add/ subtract it can do is simply limited to getting the address of the next instruction by adding or subtracting an integer from the current contents of ip. We shall see further about this while discussing the jump instructions.

Segment registers: There are four segment registers: cs (code segment), ds (data segment), es (extra segment) and ss (stack segment). The register cs, as we have already seen, indicates where the program instructions are located. The data and the extra segments are indicators of the locations for storing the data used by the program including results. The need for two segments for data storage will be brought out when we discuss string instructions. The register SS is indicating the memory area used for stack purposes. As we shall see later, stack is a very useful data structure which makes it convenient to perform certain operations, during the execution of programs.

The flag register: The flags are single bits of information based on the nature of the result of the results of the immediately preceding arithmetic or logic operation. The 8086 updates six flags when any arithmetic or logic instruction is executed.

The overflow flag indicates the addition or subtraction of numbers interpreted as signed, has resulted in a value which cannot be represented in the destination register. One more bit is necessary to correctly represent the value.

Example: Consider 4 bit numbers in 4-bit registers to simplify the explanation. Let the numbers added be -4 and -5, represented as 1100 and 1011 in the 2’s complement notation in the 4-bit registers. Note, the processor does not know about signed numbers. All it does is to add the binary numbers to produce the result 0111 with a carry of 1. However, the result 0111 in the destination register is not the correct result, as it is +7 which cannot be the sum of -4 and -5. The correct result is -9. Recall -8 is the lowest negative number that can be accommodated in the 4-bit register. The number -9 requires 5 bits or more, as it can be written only as 10111. However, this result is not satisfied by considering the carry as the overflow bit – it just happens in this case, and may not always happen, as shown in the next example. In this example, we add +7 (0111) and +2 (0010) the result obtained is 1001, with a carry of 0. This result has no carry, but if we

18

interpret the data as signed 2’s complement number of 4-bits, it becomes negative 7, which is not correct. The correct result is 01001 and requires 5-bits or more to be represented correctly. An overflow has occurred, but carry bit is not set. It is thus seen that 2’s complement overflow and carry are not the same; therefore a separate indication for 2’s complement add/ subtract overflow is needed. This is the overflow flag, and when it gets set, it indicates that the preceding add/ subtract operation has resulted in a number which cannot be represented completely by the result register, if the numbers are interpreted as signed numbers in the 2’s complement notation.

The carry flag indicates the same feature, for numbers interpreted as unsigned. This is the carry resulting from the normal binary add. It indicates as unsigned numbers, the numbers added/ subtracted have produced a result which cannot be represented fully in the destination register.

The zero flag indicates that add/ subtract operation has produced a result which is zero. The zero applies only to the data stored in the destination register, and not to the actual result of the arithmetic operation. To clarify this statement, consider adding 80 hex to 80 hex. The result is 100 hex. But what is stored in the 8-bit register will only be 00 hex, and the zero flag will be set, along with carry flag. The following experiment in debug environment shows this fact.

-a ; start assembling at default address of 100 in the CS1377:0100 add al,ah1377:0102 -raxAX 0000:8080 ; load ax with 8080, that is, ah and al with 80 hex each -r ; display all registersAX=8080 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 00E0 ADD AL,AH ; add al and ah AX=8000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 OV UP EI PL ZR NA PE CY -q ; quit debug

Note: The complete result of this addition is not zero as seen by the overflow and the carry flags being set. But the ZF indicates simply that the result that has gone into the destination register AL is zero. When using the zero flag as indicating the result of addition is zero, one has to be cautious of this possibility.

Exercise: Show that after subtraction of two numbers if this flag is checked, it will be set only when the two data are exactly identical, no matter what the data are and no matter what the overflow and the carry flags say. But if the check is done after addition, and if the intention of the check is to see if the two data are negatives of each other, then overflow flag setting can cause problems. Give examples to support these statements and check them in the debug environment.

[Hint: There is only one specific case which can have such a problem on addition: that is the case indicated in the example above.]

19

The sign flag indicates that the result in the register is storing a negative number, if interpreted in the 2’s complement mode. That is, the leading or the leftmost bit in the result register is 1.

The auxiliary carry flag indicates in case of 8-bit or 16-bit arithmetic operation, the presence or absence of a carry at the L.S. (lowest significant) Digit or L.S. nibble. This flag is useful in applying corrections, as we shall see later, in connection with decimal arithmetic instructions.

The parity flag indicates if the number of 1’s in the result register is odd or even. It is used in data communication type of application for carrying out parity check on the data received, and also for producing parity bits during transmission. For this purpose, parity flag indicates the parity of the lower byte of the result in case of 16 bit operations, especially, the 16-bit add operation produces a parity flag which corresponds only to the lower 8 bits of the result. Communication uses 8-bit operations normally. See the demonstration below:

-a1377:0100 add ax, 01377:0103 add ax, ax1377:0105 add ax, ax1377:0107 add ax, ax1377:0109 add ax, ax1377:010B -raxAX 0000:00f0-rAX=00F0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 050000 ADD AX,0000; on adding AX has even parity ; AL also even parity-t5AX=00F0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PE NC 1377:0103 01C0 ADD AX,AX ; on adding, AX has even parity ; AL has also even parityAX=01E0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0105 NV UP EI PL NZ NA PO NC 1377:0105 01C0 ADD AX,AX ; AX has even parity ; AL has odd parityAX=03C0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0107 NV UP EI PL NZ NA PE NC 1377:0107 01C0 ADD AX,AX ; AX has even parity ; AL also has even parity AX=0780 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0109 NV UP EI PL NZ NA PO NC 1377:0109 01C0 ADD AX,AX ; AX has even parity ; AL has odd parity

20

AX=0F00 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=010B NV UP EI PL NZ NA PE NC 1377:010B 0000 ADD [BX+SI],AL DS:0000=CD ; AX has even parity ; AL also has even parity-q

In addition, there are two more flags that the user can manipulate. These are: the direction flag and the interrupt flag. The direction or the D flag is used to control the direction of the string operation in the string type of instructions. The interrupt or I flag is used for interrupt control purposes as we shall see later.

There is still one more flag which is not accessible to the user, and that is the trace or the T flag, which is essentially controlled by the system.

The Flag register details are shown below bitwise (each column indicates a bit):

X X X X OV DIR INT TRACE SIGN ZERO X AUX. CY X PARITY X CY

The Flag register has 16 bits which are shown above. X’s are don’t cares.

The experiment suggested below is an attempt to study the flag register details in the debug environment.

-a ; assemble at cs:100 1377:0100 pushf ;push flag register onto stack 1377:0101 pop ax ;pop this into AX1377:0102 xor ax,0ed5 ;toggle the eight flag bits1377:0105 push ax ;put this result back in stack and then 1377:0106 popf ;into the flag register1377:0107 ;’enter’ pressed to end assembly.;Now execution of the program. -r ;get the initial register contents. AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 9C PUSHF -t5 ;trace next 5 instructions.AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0101 NV UP EI PL NZ NA PO NC 1377:0101 58 POP AX AX=3202 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 NV UP EI PL NZ NA PO NC 1377:0102 35D50E XOR AX,0ED5;Watch the highlighted flag register contents, see how they match the ;indications in register AX. Watch the change in parity bit on execution ;of this instruction. Reason out why. AX=3CD7 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0105 NV UP EI PL NZ NA PE NC 1377:0105 50 PUSH AX

21

AX=3CD7 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PE NC 1377:0106 9D POPF AX=3CD7 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0107 OV DN DI NG ZR AC PE CY

-q

Explain the program and the flag conditions at the high lighted places.

Exercises: There are in all 8 flags which can be user controlled, and the ‘xor’ing above reverses all these 8 flag bits. Use the method of PUSH AX followed by POPF to identify one by one, which flag register bit corresponds to which flag. Find also which value of the flag bit represents which condition of the flagged entity.

The data types that can be used

Although the computer basically works on unsigned binary number system, there are instructions which can manipulate the data in registers/ memory as signed numbers, decimal (binary coded or BCD) numbers or as ASCII characters for display purposes and so on. We will now look into the details of these different data types handled by the instruction set of 8086 processor. The data size handled by the processor is 8-bits and 16-bits as we have already seen.

The data can be simple unsigned binary numbers. In this case, carry and zero flags will be of interest.

Question: In case of subtraction of unsigned numbers, do you think the sign flag can give any meaningful indication about which of the two numbers is bigger? [Hint: No. Can you support it with examples? Also try to reason out which flag(s) give this information about the greater of the two source numbers, in the subtraction.]

The processor 8086 can also handle signed binary numbers of 16-bits or 8-bits. In this case, the overflow, sign and zero flags will be of interest.

Question: Reason out which flags will be required to find out if the minuend is greater than, the same as or less than the subtrahend in case of subtraction/ comparison of two signed numbers. [Hint: All three flags indicated above. Give the logic for the comparison in terms of these three flags]

Two 2-digit BCD (binary coded decimal) numbers can be handled for addition or subtraction, in which, the result of binary addition or subtraction should be in the register AL. In general, the operation for BCD number addition/ subtraction requires two instructions to be executed. The first indicating the normal add, subtract binary operation with the result in register AL, and the second to correct or adjust the result of the binary operation in AL to be consistent with the result of BCD operation. If correction for operation of addition is required, the add must be followed by the instruction DAA (decimal adjust AL for addition), and if correction for subtraction is desired, the subtract instruction must be followed by DAS (decimal adjust AL for subtraction). This BCD

22

correction process involves the use of the carry and the auxiliary carry flags as we shall see later.

Question: In the above description, we have not included the comparison operation. Comparison, as you know, is based on the result of subtraction. Do you think we should use DAS after the comparison instruction while comparing two BCD numbers? [Hint: The answer is ‘No’. Try to reason out this issue.]

It should be noted that only provision is made for addition and subtraction of 2-digit BCD numbers, and these numbers are to be unsigned only. No direct way is provided for handling signed BCD numbers or handling larger BCD numbers. Direct multiplication and division of BCD numbers is not provided for, in 8086. The instructions AAA, AAS, AAM and AAD provide for decimal addition, subtraction, multiplication and division essentially at single digit level, in two stage operations, as we shall see later while discussing these instructions.

There are some (actually very little) provisions in 8086 for handling ASCII (American Standard Code for Information Interchange) characters. The console keyboard and the monitor or other input/ output devices handle characters in the ASCII code, as they have to take care of numbers, as well as textual material. However, interpretation of data as ASCII is mainly at the operating systems level. Certain interrupt operations interpret the data in the registers AL, AH and AX as ASCII characters. Details of these operations we shall look into later.

Instruction Set Architecture 8086 uses what is known as the two address register/ memory type of

architecture. Many operations handled by microprocessors fall into operations on two operands producing a single result. Operations like arithmetic ADD or logical AND etc. are having two input operands and produce one result. In all, to handle these three data, we require three registers or memory locations. Intel 8086 processor specifies only two locations for such instructions in terms of registers or in terms of one register and one memory. Use of memory to specify both source operands is not permitted in 8086. Both these (register and register or register and memory), specify the source operands. Then where does the result go? The result goes to the first source operand specified, may be memory or register, replacing that operand. Note this becomes quite convenient if we have to do an operation like ADD, on a series or a chain of data stored in memory or registers. Suppose we want to add contents of AX, BX and CX registers. In such a situation, the two instructions: ADD AX, BX, followed by ADD AX, CX will give us the total of BX, CX and what was originally in the register AX.

If the operation involves only a single operand like increment or rotate etc., the result naturally replaces that operand.

This type of instruction set architecture is known as register/ memory architecture.

Other instruction set architectures used in other processors can be of the zero address type (operands, top two in stack, result replacing the stack top operand, used in calculator type of systems), accumulator type or single address architecture(one operand

23

assumed to be in a special register called accumulator, the other source operand specified by the instruction, and the result replacing the accumulator data – used mainly in 8-bit processors), or three address architecture, where three separate locations in registers or memory are specified, two for the source operands and one for the result. The data sources, in three address machines will normally be only in registers (this is known as load/ store or register/ register architecture, where memory data can only be loaded to a register or register data can only be stored in memory, while only register data can participate in arithmetic logic operations). This architecture is used mainly in RISC (reduced instruction set computer) type of machines. Other types of architectures are also there, but they are less commonly used.

Addressing Modes and Addressing We have seen many instructions need to specify two source operands. The

processor 8086 permits a wide variety of methods for addressing the operands. If the operand is directly available in a register, the method of specifying this data is known as register direct. Consider the instruction: ADD AX, [BX]. The instruction has three parts; the first part ADD is known as the opcode part. The second part, AX, is the Source 1, while the third, [BX], is source 2. Here the data for source 1 is the content of the register AX, and is directly to be taken from the register AX; this method of specifying the data can be done in the assembly language programming by simply indicating the name of the register where the data is available. The source 2 is indicated as BX within square brackets. This means the data is not directly available in the register BX; the data is in the memory at the address pointed by the data contained in the register BX. The method of addressing used here is known as register indirect. The net result of executing this instruction is to add to the data in register AX, the contents of the memory at the address indicated in register BX. Original contents of AX will be replaced by the sum now. This would involve a memory read first, then the ALU operation and finally replacing of the data in AX by writing into the register AX, the result obtained from the ALU of the processor. Had the instruction been ADD [BX], AX, on execution the AX data will not be altered; the result from ALU will now be written to the memory at the address indicated in BX. While ADD AX, [BX] involves only a memory read operation, ADD [BX], AX will involve a memory read for the source 1, and a memory write back to the same location indicated by BX. Note, instructions of the type ADD [BX], [SI] are not permitted, as they involve both operands in memory. Such operations are permissible only in memory/ memory type of instruction set architecture like in Motorola 68000 processor.

Exercise: Study register direct and indirect addressing in the debug environment.

Immediate addressing: Immediate addressing is done by giving the data directly in the instruction. For example, in the instruction ADD AX, 1234h, the data to be added to the contents of AX (register direct) is directly the hex number 1234, appearing immediately after AX in the instruction. The result will be stored in the location of the first source operand, that is, in register AX. Note that with this type of addressing, the first operand must be a place where the result can be stored, which means that the immediate data will always be the second operand.

24

Based or indexed addressing: Based or indexed addressing gives useful method of addressing data arrays stored in memory. Although both do the same thing, the two different names provide two different situations where this method of addressing can be used. An example of this type addressing is: INC [BX+2] which can also be written as INC 2[BX]. This means read the data from memory at the address which is 2 more than the address contained in register BX; increment the data, and write it back to the same location. Consider a byte array starting at address 1200 hex. The first element of the array is available by indirect addressing through the register BX, with BX having the address 1200 hex, while any ith byte of the array is addressable with the index i-1 in the array using the address [BX+i-1]. Here we have the base address of the array in BX and the index number is specified as an unsigned integer in the instruction. This is known as based addressing. Now consider another situation where we have 2 different byte arrays; one starting at 1200 hex and the other, say, at 1380 hex. And we want to handle, say, the 5th byte entry of each of the arrays. Then we will store 5 -1, or the index number 4 in the BX register, and use the address 1200h[BX] to refer to the 5th byte of the first array and 1380h[BX] to refer to the 5th byte of the second array. In this case we have stored the index number in the register and the base address of the array is specified as an immediate data to be added to the BX register content to get the address of the data in memory. Since the index number is now in the register BX, this method of addressing is known as indexed addressing. Note that the processing required to get the memory address in both based as well as indexed addressing is the same. The difference is only in our interpretation in terms of the problem requirement.

Based and indexed addressing: Intel 8086 permits a combination of based and indexed addressing with an immediate number in the instruction. For this purpose, BX and BP are considered as base registers, while SI and DI are considered as index registers. Any base register and any index register along with an additional offset number can be used for the addressing in this mode. The address 2[BP+DI] is a valid address in this mode; this address will correspond to the address obtained by adding the contents of BP and DI registers and then adding the number 2 to this sum. The address 2[BX+BP] will be invalid and so will the address 2[SI+DI] as in both these cases we do not satisfy the combination one base and one index register within the square brackets. There are several ways in which the based indexed instructions with displacements can be written in an assembly language program. Exercise: Use the debug environment and try to find four valid methods of writing this based indexed addressing instruction, with displacement, in the assembly language. (Hint: Try various forms like 2[bx][di], 2[bx, di], 2[bx]di etc and find which gets rightly unassembled as [bx+di+2]).

The role of the segment registers in memory addressing: The Intel 8086 processor provides for addressing of memory with 20 bits of address, 00000 hex to fffff hex. We have so far been seeing that addresses can be contained in 16-bit registers (like in register indirect as well as based or indexed addressing etc.). Then how is the 20-bit address produced? The answer lies in the fact that not just one register, but two 16-bit values are used in producing the 20 bit address. Of these, we have already seen in the above section on addressing modes, how the 16 bit memory address is generated based

25

on the instruction. The address thus obtained is known as the EA (effective address) or the offset address of the data. This address is now combined with a value derived from one of the segment registers to produce the 20-bit absolute address of the data. The derivation of the 20-bit address is done by extending 16-bit segment register to 20-bits by simply adding four binary 0’s or just one hex zero at the end of the segment register content. The 16-bit address, EA, obtained from the instruction is now added to the 20-bit address derived from the segment register to get the 20-bit absolute memory address. Any carry resulting from this addition is simply ignored by the processor. Which of the four segment registers goes with which effective address? IP is the EA of the instruction to be fetched. It always goes with the code segment. Normally EA of any data goes with the data segment DS. However, in case of string instructions the EA of the source is associated with the SI register, and this goes with the data segment. The destination is associated with the DI register and this goes with the extra segment ES. The use of separate segments with the source and data addresses permit the movement of data from any source address to any destination address in the full 20 bit address range of the memory. Attaching both source and destination addresses to the same segment register will give a total address range of only 16 bits from the segment base, as the effective address is only 16 bits. Please note that the offset address in any segment can only be in terms of 16-bits, which means a segment can accommodate 65536 bytes of data or program.

Segment and effective address register combinations permitted: IP register is always combined with CS, as we have already seen. BX, SI and DI registers normally go with the data segment register DS. In string instructions, DI always goes with ES. Register SP always goes with SS and BP normally goes with SS. Wherever we have used the word ‘normally’, other segment registers can be used, provided they are explicitly stated in the instruction by using segment override instruction prefixes. Following is an example of the study of use of instruction prefixes in the debug environment. It also shows how the confusion about word or byte of memory data is resolved in an ambiguous situation.-a 1001377:0100 mov bx,2001377:0103 cs: ; segment override prefix. Normally ; [BX] would use the DS segment.1377:0104 mov word ptr [bx], 34; ambiguity removed, (word at address bx ; intended); data intended is not the ; byte 34, but the word 0034 1377:0108 cs:1377:0109 mov ax,[bx] ; no ambiguity, AX is a word reg; so BX is a word ; pointer.1377:010B-u100 10a ; unassemble between 100 and 10a1377:0100 B80002 MOV BX,2001377:0103 2E CS: ; segment override prefix 1377:0104 C7073400 MOV WORD PTR [BX],00341377:0108 2E CS: ; segment override prefix again. 1377:0109 8B07 MOV AX,[BX]-r

26

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 BB0002 MOV BX,0200 -t3AX=0000 BX=0200 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PO NC 1377:0103 2E CS: 1377:0104 C7073400 MOV WORD PTR [BX],0034 CS:0200=75C2AX=0000 BX=0200 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC

1377:0108 2E CS:1377:0109 8B07 MOV AX,[BX] CS:0200=0034AX=0034 BX=0200 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=010B NV UP EI PL NZ NA PO NC -q ; quit the debug and go to DOS environment.

Exercise: The DS register in an 8086 has the hex number 1234. What memory address in 20 bits is indicated at the effective or offset address of 123A hex? If the segment override prefix, ES: is used, which effective address will point to the same physical memory location when ES has the hex address 123B? [Ans: 1357A hex and 11CA hex]

Instruction set: The instructions of 8086 are discussed in detail below. The

instruction description and the operation details of the instructions are taken essentially from the Intel IA-32 Software Developer’s manual, vol. 2. To start with, the various types of 8086 instructions available are listed, and then the instructions available in each type and the details of their operation are presented. There are several types of instructions as listed below:

1. Data transfer instructions (including I/O transfers)2. Binary arithmetic instructions3. Decimal (BCD, ASCII) arithmetic instructions4. Logical instructions5. Shift and rotate instructions 6. Control transfer instructions7. String instructions8. Flag control instructions9. Segment register instructions10. Miscellaneous instructions

Instruction Details

27

1. Data Transfer instructions (including I/O transfers): Data Transfer instructions essentially copy the data and do not affect any flags (except of course the POPF instruction which modifies all the flags as per the stack top word)

• MOV: stands for move, it is actually copy, that is, when data is moved from one register source to a destination register, source is not destroyed, only, there will be a copy of this data in the destination register.

Examples: mov ax, bx ; data in bx is copied into axmov dx, [bx]mov [si + 34], cxmov bx, 1234h; 1234 hex goes to reg. bxmov wordptr [bx+si + 2], 23 ; 23 decimal or 0017 h is moved.

• XCHG: Exchange instruction exchanges data between registers or between register and memoryExamples: xchg bx, dx ; reg-reg exchange

xchg ax, [bx] ; reg-memory exchangexchg bx, [1234] ; reg-memory with direct addressing

• PUSH: Push causes the data in the source register to be copied on to the stack top.Examples: push ax

push [bx] ; push memory word at address in bx, on to the stack.Push [1234] ; push word at effective address 1234 to the stackpushf ; push the flag register on to the stack top.

• POP: Pop causes the stack top moved to (that is, removed from the stack and loaded onto) the destination register or memory specified by the instruction.Examples: pop bx ; stack top moved to reg. bx

pop [bx + si + 4] ; stack top to memory at the address givenpop [1234] ; pop to memory at effective address 1234 popf ; pop the stack top onto the flag register

Among all the data transfer instructions popf is the only instruction that affects and modifies the flags.

• IN: The IN instruction reads from an input port into AL or AX (only these two registers), to be specified in the instruction. The port address is generally in the register DX. But if it is 8-bits or less, then it can also be directly given in the instruction.Example: in ax, 28h; read 16-bit port at address 28 hex into reg ax

in ax, dx; read 16-bit port at address in dx into reg axin al, 15 h; read 8-bit port at address 15 hex into reg alin al, dx; read 8-bit port at address in dx

• OUT: The OUT instruction outputs the data in register AX or AL (only) to be specified in the instruction to the output port indicated directly in the instruction if the port address is 8 bits or less, or in the register DX (for addresses 16 bits or less) Examples: out 16h, ax; write to output port at address 16h from reg. ax

out dx, ax; write to output port at address in dx from reg. ax

28

out 23h, al; write to output port at address 23h from reg al out dx, al; write to output port at address in dx from reg al

• CBW: Convert byte to word. The source register al and the destination ax are both implied and not specifically mentioned in this instruction. This instruction is used to extend the 8-bit integer (signed number) in reg. al to 16-bit integer in ax. The process is called sign extension. If the number in al is positive, ah will be loaded with 00 hex, else ah will be loaded with ff hex.Example: cbwExercise: Study the instruction in debug

• CWD: Convert word in reg. ax (implied and not stated in the instruction) to double word in regs. dx:ax (also implied and not stated). That is, sign extend from ax into dx:ax.Example: cwdExercise: Study the instruction in debug

2. Binary Arithmetic instructions: All binary arithmetic instruction update the flag register based on the result of the operation performed.

• ADD: DescriptionAdds the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format. The ADD instruction performs integer addition. It evaluates the result for both signed and unsigned integer operands and sets the OF and CF flags to indicate a carry (overflow) in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result. Operation:

DEST ← DEST + SRC;The OF, SF, ZF, AF, PF, and CF flags are set according to the result.Examples: ADD AX, BX ADD [BX + 4], DX

ADD CX, 2[SI] ADD DX, 123 H

• ADC: Adds with carry. Same as ADD with the following change in the operation:

DEST ← DEST + SRC + CF; The OF, SF, ZF, AF, PF, and CF flags are set according to the result.

• SUB: Subtracts. Follows the same rules as ADD with the following change in the operation:

DEST ← DEST – SRC;The OF, SF, ZF, AF, PF, and CF flags are set according to the result.

• SBB: Subtract with borrow. Same as SUB with the following change in the operation:

29

DEST ← DEST – (SRC + CF); The OF, SF, ZF, AF, PF, and CF flags are set according to the result.• CMP: Compare two operands.

Compares the first source operand with the second source operand and sets the status flags in the FLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction. When an immediate value is used as an operand, it is sign-extended to the length of the first operand. Operation:temp ← SRC1 − SRC2; In case an immediate value is used, thentemp ← SRC1 − Sign Extend (SRC2);Modify Status Flags; (* Modify status flags in the same manner as the SUB instruction*)Flags Affected:The CF, OF, SF, ZF, AF, and PF flags are set according to the result.Examples: CMP AX, 24 H; (24 is sign extended to 16 bits before subtraction, because AX is a 16-bit register)

CMP BYTEPTR[BX], -24 H; (no sign extension done, data in bytes are being handled)

CMP BX, SICMP AL, [BX]; (BX will be taken only as a byte pointer, as

AL is a byte register)The meaning of sign extension is seen in the following debug study:

-a1377:0100 cmp ax,251377:0103 cmp ax,-251377:0106 cmp ax,db1377:0109 cmp ax,-db1377:010C-u 100 10B1377:0100 3D2500 CMP AX,0025 1377:0103 3DDBFF CMP AX,FFDB 1377:0106 3DDB00 CMP AX,00DB1377:0109 3D25FF CMP AX,FF25

• MUL: Description: Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in register AL or AX (depending on the size of the operand); the source operand is located in a general-purpose register or a memory location. The action of this instruction and the location of the result depend on the opcode and the operand size. Operation:

IF byte operationTHENAX ← AL ∗ SRC

30

ELSE (* word operation *)DX:AX ← AX ∗ SRC Flags AffectedThe OF and CF flags are set to 0 if the upper half of the result is 0; otherwise, they are set to 1.The SF, ZF, AF, and PF flags are undefined.Examples: MUL BX

MUL WORDPTR [BX + DI]48 HMUL BYTEPTR [SI]MUL CL

• IMUL: Integer (signed) multiply. Similar to MUL, except the data are considered as signed integers.

Flags are also affected similarly as for MUL.• DIV: Divide the unsigned integer dividend in the accumulator by the

unsigned integer divisor specified in the instruction. If the divisor specified is a word register or word memory, the dividend is considered to be the double word in DX:AX and the quotient of division will be in register AX, with the remainder in register DX and the divisor word specified in the instruction will not be altered. If the divisor specified in the instruction is a byte register or byte memory, then the accumulator will be the word register AX. The quotient of the division will be in AL register, and AH register will have the remainder. In case of word division, if the divisor word is not greater than the part of the dividend word in DX, then the quotient obviously will not fit into the register AX. Execution of DIV instruction in such a case will cause a division overflow exception to be generated, and operating system should take care of this exception. We shall see later, what ‘exception’ means. Similarly if the byte divisor specified in the byte divide instruction is not greater than the byte part of the dividend contained in the register AH, division overflow exception will be generated.Examples of DIV instruction: DIV BX;

DIV WORDPTR [DI]; DIV CL;

DIV BYTEPTR [SI];Exercises: Test the DIV instruction in the debug environment. See what happens when the data is such as to generate division overflow exception.[Hint: here is an example of such a study.

-a1377:0100 div bl1377:0102 -raxAX 0000:1234-rbxBX 0000:000f-r

31

AX=1234 BX=000F CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 F6F3 DIV BL ; Note AH > BL and division overflow exception will occur. -t ; trace the execution of this instruction Divide overflow ; this is now in the DOS environment as seen by the

; DOS prompt appearing below, the debug prompt ‘-‘ ; should have been seen otherwise in front of this. C:DOCUME~1\acer\MYDOCU~1\MYFILE~1\REF~1.MAT\DOSPRO~1> ; DOS prompt.

On executing the instruction, the overflow exception is seen to cause an exit from debug to the DOS environment, and the words ‘Division overflow’ get displayed in the DOS environment.]

The CF, OF, SF, ZF, AF and PF flags are undefined, when DIV is

executed

• IDIV: Integer divide, same as DIV but the data and the results are considered as signed integers.

The CF, OF, SF, ZF, AF, and PF flags are undefined, when IDIV is executed.

There is an interesting doubt that can come up with signed division. Suppose we divide -7 by +3, there is no doubt about the sign of the quotient, here the quotient can only be negative. The confusion is about the magnitude of the quotient and the sign of the remainder. In the given example, one can say the quotient is -3 and the remainder is +2, or the quotient is -2 and the remainder is -1, as both these solutions satisfy the basic requirement that (quotient)*(divisor) + remainder = dividend, and that the magnitude of the remainder is less than the magnitude of the divisor;

(-3)*(+3) + (+2) = -7; also (-2)*(+3) + (-1) = -7; which is correct? Exercise: Try to see, in the debug environment, what the processor actually gives; try to reason out logically if that is alright. [Hint: You will find the processor gives a result corresponding to doing the division of the magnitudes involved, and then attach signs as necessary, based on the signs of the given data. In the given example, the division of magnitude 7 by magnitude 3 is done to get the result 2 for the quotient, and 1 for the remainder. Since the dividend and the divisor have opposite signs as given, the quotient becomes negative, while the remainder will carry the same sign as that of the dividend. It is logically OK, because you are distributing negative numbers to three people, and after distributing 2 negative to each, you have remaining with you, 1 negative. So the quotient is -2 and the remainder is -1. The CF, OF, SF, ZF, AF, and PF flags are undefined.Below, we see the demonstration in the debug environment.

-a

32

1377:0100 idiv bl1377:0102-raxAX 0000:fff9 ; this makes AX, the dividend = -7 -rbxBX 0000:803 ; this makes the divisor in BL = +3; (we ignore BH) -rAX=FFF9 BX=0803 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 F6FB IDIV BL -tAX=FFFE BX=0803 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 NV UP EI PL NZ NA PO NC ; we see the result: quotient in AL = -2, and remainder in AH = -1.; The devisor in BL is unaltered. (The data in BH which is not relevant to the program is also not altered.)-q ; quit debug.

Note: In the above experiment if initially you load into AX some number like FABC H, and keep BL the same, you will see divide overflow occurring.

• INC: Increment register or memory. This involves only a single operand. Adds 1 to the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (If we use an ADD instruction with an immediate operand of 1 to perform an increment operation that does update the CF flag.) Operation:

DEST ← DEST + 1; The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set

according to the result.• DEC: Similar to INC, this instruction does the operation:

DEST ← DEST – 1;The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set

according to the result.• NEG: Replaces the value of operand (the destination operand) with its

two’s complement. (This operation is equivalent to subtracting the operand from 0.) The destination operand is located in a general-purpose register or a memory location.DEST ← – (DEST)Flags AffectedThe CF flag set to 0 if the source operand is 0; otherwise it is set to 1. The OF, SF, ZF, AF, and PF flags are set according to the result.

33

3. Decimal (BCD, ASCII) arithmetic instructions: 8086 provides for handling decimal digits in byte size, either as 2-bigit BCD (unsigned) or single-digit ASCII character byte ( 30-39 Hex in order, standing for 0-9 BCD ). The instructions DAA and DAS are for handling 2-digit BCD for addition and subtraction, while the instructions AAA, AAS, AAM and AAD are for handling single digit ASCII data for BCD digits for addition, subtraction, multiplication and division respectively. It is to be noted that inputs from the keyboard or other input devices, as well as outputs to monitor, printer and other output devices will usually be in ASCII code, so the four ASCII adjust instructions above, starting with the characters AA, facilitate the handling of the BCD digits in the ASCII character code, for add, subtract, multiply and divide operations. It is also to be noted that all the six instructions stated above have an A as the middle character. This A stands for ADJUST. This implies the operation of add, subtract, multiply and divide are not done by these instructions; they are done separately by the normal ADD, SUB, MUL and DIV instructions considering the data as normal binary. What these instructions do is, to adjust the result of binary operation, to match the result of decimal operation. We now look into the details.

• DAA: Decimal adjust accumulator for additionDescription: Adjusts the sum of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two 2-digit, packed BCD values and stores a byte result in the AL register. The DAA instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal carry is detected, the CF and AF flags are set accordingly. Operation: A complete description of the operation is as follows:old_AL ← AL;old_CF ← CF; AL & CF are saved in temporary registersCF ← 0;IF (((AL AND 0FH) > 9) OR AF = 1)THENAL ← AL + 6;CF ← old_CF OR (Carry from AL ← AL + 6);AF ← 1;ELSEAF ← 0; The first IF ends hereIF ((old_AL > 99H) OR (old_CF = 1))THENAL ← AL + 60H;CF ← 1;ELSECF ← 0;Flags affected: The CF and AF flags are set if the adjustment of the value results in a decimal carry in either digit of the result (see the “Operation”

34

section above). The SF, ZF, and PF flags are set according to the result. The OF flag is undefined.An experimental study of DAA using an ALP converted into an executable program with MASM and LINK and execution of the program in the debug environment is presented below.

The assembly language program studied is given below. (Note the data chosen to be added to the byte 87 [1000 0111] in BL register in the nine cases.)code segmentassume cs:codestart: mov bl, 87h ; this is the addend mov cx, 9 ; 9 different augends are chosenassume ds: code mov ax, cs mov ds, ax ; initialise data segment; note this method cld lea si, augends back: lodsb ; string load byte, without ‘rep’ prefix. ; note cx (count reg) is not relevant here add al, bl ; get the binary sum daa ; correct the sum for decimal addition ; note, data in ah is unaffected by this inst. loop back int 01; augends db 12h; no cy, no ac, no 'abcdef' hex in the sum db 19h; no cy, ac, no 'abcdef' in the sum db 91h; cy, no ac, no 'abcdef' in the sum db 32h; 'b' in msd, no cy, no ac db 16h; 'd' in lsd, no cy, no ac db 96h; cy and 'd' in lsd db 69h; ac and 'e' in msd db 99h; ac and cy and no ‘abcdef’ in the sum db 67h; sum becomes 'ee'code endsend startThe execution of the program in debug environment:-u 0 1e ;unassemble up from 0 to 1e hex in the code segment13D5:0000 B387 MOV BL,87 13D5:0002 B90900 MOV CX,0009 ; 9 different data added to 87 13D5:0005 8CC8 MOV AX,CS 13D5:0007 8ED8 MOV DS,AX ; DS is made the same as CS 13D5:0009 FC CLD ; instruction to be studied yet 13D5:000A 8D361600 LEA SI,[0016]; yet to be studied 13D5:000E AC LODSB ; yet to be studied 13D5:000F 02C3 ADD AL,BL 13D5:0011 27 DAA 13D5:0012 E2FA LOOP 000E ; yet to be studied 13D5:0014 CD01 INT 01 ; end execution & return to Debug From here it is actually data sitting in the code segment and interpreted as instructions (unassembled) under the ‘u 0 1e’command 13D5:0016 12 19 ADC BL,[BX+DI]; 2 data bytes 13D5:0018 91 XCHG CX,AX ; 1 data byte 13D5:0019 32 16 96 69 XOR DL,[6996] ; 4 data bytes 13D5:001D 99 CWD ; 1 data byte 13D5:001E 67 DB

67 ; 1 data byte

35

-g 11 ; execute until (and excluding) the instruction at 11 hex.; Stop just before DAA for the first data, that is, after ADD AL, BL. ; From here, the program is traced for every data.AX=1399 BX=0087 CX=0009 DX=0000 SP=0000 BP=0000 SI=0017 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ NA PE NC 13D5:0011 27 DAA AX=1399 BX=0087 CX=0009 DX=0000 SP=0000 BP=0000 SI=0017 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI NG NZ NA PE NC ; no change in result; BCD or binary addition give the same result13D5:0012 E2FA LOOP 000E AX=1399 BX=0087 CX=0008 DX=0000 SP=0000 BP=0000 SI=0017 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000E NV UP EI NG NZ NA PE NC 13D5:000E AC LODSB ; second data AX=1319 BX=0087 CX=0008 DX=0000 SP=0000 BP=0000 SI=0018 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI NG NZ NA PE NC 13D5:000F 02C3 ADD AL,BL AX=13A0 BX=0087 CX=0008 DX=0000 SP=0000 BP=0000 SI=0018 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ AC PE NC 13D5:0011 27 DAA AX=1306 BX=0087 CX=0008 DX=0000 SP=0000 BP=0000 SI=0018 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PE CY ; note the modification here, 66 h added to result of binary add; why?13D5:000E AC LODSB; third data AX=1391 BX=0087 CX=0007 DX=0000 SP=0000 BP=0000 SI=0019 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL NZ AC PE CY 13D5:000F 02C3 ADD AL,BL AX=1318 BX=0087 CX=0007 DX=0000 SP=0000 BP=0000 SI=0019 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 OV UP EI PL NZ NA PE CY 13D5:0011 27 DAA AX=1378 BX=0087 CX=0007 DX=0000 SP=0000 BP=0000 SI=0019 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 OV UP EI PL NZ NA PE CY; note here, 6 added to MSD. Reason? 13D5:000E AC LODSB; fourth data AX=1332 BX=0087 CX=0006 DX=0000 SP=0000 BP=0000 SI=001A DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F OV UP EI PL NZ NA PE CY 13D5:000F 02C3 ADD AL,BL AX=13B9 BX=0087 CX=0006 DX=0000 SP=0000 BP=0000 SI=001A DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ NA PO NC 13D5:0011 27 DAA AX=1319 BX=0087 CX=0006 DX=0000 SP=0000 BP=0000 SI=001A DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ NA PO CY ; here also 6 added to MSD; why?13D5:000E AC LODSB ; fifth data AX=1316 BX=0087 CX=0005 DX=0000 SP=0000 BP=0000 SI=001B DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL NZ NA PO CY

36

13D5:000F 02C3 ADD AL,BL AX=139D BX=0087 CX=0005 DX=0000 SP=0000 BP=0000 SI=001B DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ NA PO NC 13D5:0011 27 DAA AX=1303 BX=0087 CX=0005 DX=0000 SP=0000 BP=0000 SI=001B DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PE CY ; 66 added to binary sum; why?13D5:000E AC LODSB ; sixth data AX=1396 BX=0087 CX=0004 DX=0000 SP=0000 BP=0000 SI=001C DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL NZ AC PE CY 13D5:000F 02C3 ADD AL,BL AX=131D BX=0087 CX=0004 DX=0000 SP=0000 BP=0000 SI=001C DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 OV UP EI PL NZ NA PE CY 13D5:0011 27 DAA AX=1383 BX=0087 CX=0004 DX=0000 SP=0000 BP=0000 SI=001C DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 OV UP EI NG NZ AC PO CY ; 66 added to result of binary addition; why?3D5:000E AC LODSB ; seventh data AX=1369 BX=0087 CX=0003 DX=0000 SP=0000 BP=0000 SI=001D DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F OV UP EI NG NZ AC PO CY 13D5:000F 02C3 ADD AL,BL AX=13F0 BX=0087 CX=0003 DX=0000 SP=0000 BP=0000 SI=001D DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ AC PE NC 13D5:0011 27 DAA AX=1356 BX=0087 CX=0003 DX=0000 SP=0000 BP=0000 SI=001D DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PE CY ; 66 added, reason out why13D5:000E AC LODSB ; Eighth data AX=1399 BX=0087 CX=0002 DX=0000 SP=0000 BP=0000 SI=001E DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL NZ AC PE CY13D5:000F 02C3 ADD AL,BL AX=1320 BX=0087 CX=0002 DX=0000 SP=0000 BP=0000 SI=001E DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 OV UP EI PL NZ AC PO CY 13D5:0011 27 DAA AX=1386 BX=0087 CX=0002 DX=0000 SP=0000 BP=0000 SI=001E DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 OV UP EI NG NZ AC PO CY ; also 66 added; why?13D5:0012 E2FA LOOP 000E AX=1386 BX=0087 CX=0001 DX=0000 SP=0000 BP=0000 SI=001E DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000E OV UP EI NG NZ AC PO CY 13D5:000E AC LODSB ; last data AX=1367 BX=0087 CX=0001 DX=0000 SP=0000 BP=0000 SI=001F DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F OV UP EI NG NZ AC PO CY 13D5:000F 02C3 ADD AL,BL AX=13EE BX=0087 CX=0001 DX=0000 SP=0000 BP=0000 SI=001F DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ NA PE NC 13D5:0011 27 DAA

37

AX=1354 BX=0087 CX=0001 DX=0000 SP=0000 BP=0000 SI=001F DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PO CY; 66 again! Why? 13D5:0012 E2FA LOOP 000E ; not executed.-q ; quit debug

Exercise: Write a comprehensive analysis of what happens when DAA is executed, so that the results of binary addition are converted to results of BCD addition. Note, the processor does not know BCD. Its ALU only does binary operations. BCD is only the user’s interpretation, oblivious to the processor hardware. Note the two hex digits of the sum and the AC and CY flags completely determine, whether 00, 06, 60 or 66 h is added to the binary sum, to make the resulting sum match the result of BCD addition. The carry of the BCD addition is available as carry after the DAA is executed. You can also execute the program with a jump instruction at the end and trace through the program as many times as you wish and try several data, to get more information on the performance of DAA

• DAS: Adjusts the result of the subtraction of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one 2-digit, packed BCD value from another and stores a byte result in the AL register. The DAS instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal borrow is detected, the CF and AF flags are set accordingly.Operation:old_AL ← AL;old_CF ← CF; AL and CF stored in temporary registers.CF ← 0;IF (((AL AND 0FH) > 9) OR AF = 1)THENAL ← AL − 6;CF ← old_CF OR (Borrow from AL ← AL − 6);AF ← 1;ELSEAF ← 0; The first IF ends here.IF ((old_AL > 99H) OR (old_CF = 1))THENAL ← AL − 60H;CF ← 1;ELSECF ← 0;Example: The execution of the sequence of instructions SUB and DAS is shown below, with details before and after the execution of each instruction.SUB AL, BL: Before: AL=35H BL=47H FLAGS(OSZAPC)=XXXXXX

38

After: AL=EEH BL=47H FLAGS(0SZAPC)=010111DAA Before: AL=EEH BL=47H FLAGS(OSZAPC)=010111After: AL=88H BL=47H FLAGS(0SZAPC)=X10111Flags Affected by the DAS instruction:The CF and AF flags are set if the adjustment of the value results in a decimal borrow in either digit of the result (se e the “Operation” section above). The SF, ZF, and PF flags are set according to the result. The OF flag is undefined.

• AAA: ASCII adjust AL after addition.The instruction AAA, adjusts the sum of two unpacked BCD values(or even ASCII values, as the AAA destroys the upper nibble of the result of AL register and does not depend on CY flag for its operation) to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result. If the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are set to 0. The operational details are as follows:

IF ((AL AND 0FH) > 9) OR (AF = 1)THENAL ← AL + 6;AH ← AH + 1;AF ← 1;CF ← 1;ELSEAF ← 0;CF ← 0;AL ← AL AND 0FH;Flags Affected:The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise they are set to 0. The OF, SF, ZF, and PF flags are undefined.Below is shown an example of executing AAA following an ADD instruction in the debug environment

-a ;assemble at 100 (default)13D5:0100 mov al, 36 ;ASCII character ‘6’13D5:0102 mov bl, 39 ;ASCII character ‘9’13D5:0104 add al, bl ;binary add 13D5:0106 aaa ;adjust for unpacked BCD after ASCII add 13D5:0107 -u 100 106 ; unassemble between 100 and 10613D5:0100 B036 MOV AL,36 ; ASCII value taken here 13D5:0102 B339 MOV BL,39

39

13D5:0104 02C3 ADD AL,BL 13D5:0106 37 AAA

- r ;show initial register contentsAX=0000 BX=0000 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0100 B036 MOV AL,36 -t4 ;execute and trace four instructionsAX=0036 BX=0000 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0002 NV UP EI PL NZ NA PO NC 13D5:0102 B339 MOV BL,39 AX=0036 BX=0039 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0004 NV UP EI PL NZ NA PO NC 13D5:0104 02C3 ADD AL,BL AX=006F BX=0039 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI PL NZ NA PE NC 13D5:0106 37 AAA AX=0105 BX=0039 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ AC PO CY -q ; quit debug Note: It will help if AH is zero before executing AAA; any carry as comes

here (sum > 9), will then directly be in AH. If AH has any data, then it will be simply incremented. You can try this in the debug environment as an additional experiment

• AAS:ASCII adjust AL after subtraction:Adjusts the result of the subtraction of two unpacked BCD values (or ASCII values as in AAA) to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte result in the AL. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result. If the subtraction produced a decimal carry, the AH register decrements by 1, and the CF and AF flags are set. If no decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL register is left with its top nibble set to 0.Operation:IF ((AL AND 0FH) > 9) OR (AF = 1)THENAL ← AL – 6;AH ← AH – 1;AF ← 1;CF ← 1;ELSECF ← 0;

40

AF ← 0;

AL ← AL AND 0FH;Flags AffectedThe AF and CF flags are set to 1 if there is a decimal borrow; otherwise, they are set to 0. The OF, SF, ZF, and PF flags are undefined.Exercise: study the AAS instruction in the debug.

• AAM: ASCII adjust AX after multiply:Adjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked (base 10) BCD values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is only useful when it follows a MUL instruction that multiplies (binary multiplication) two unpacked BCD values and stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain the correct 2-digit unpacked (base 10) BCD result.The generalized version of this instruction allows adjustment of the contents of the AX to create two unpacked digits of any number base (see the “Operation” section below). Here, the imm8 byte is set to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAM mnemonic is interpreted by all assemblers to mean adjust to ASCII (base 10) values. To adjust to values in another number base, the instruction must be hand coded in machine code (D4 imm8).Operation:tempAL ← AL;AH ← tempAL / imm8; (* imm8 is set to 0AH for the AAM mnemonic *)AL ← tempAL MOD imm8;The immediate value (imm8) is taken from the second byte of the instruction.Flags Affected:The SF, ZF, and PF flags are set according to the resulting binary value in the AL register. The OF, AF, and CF flags are undefined.The following is an example of hand coding for the base 12 conversion in the debug environment:

-raxAX 0000:2070 ; AH has an irrelevant data 20H and AL has the data 70h (= 94 ; in the base 12 system, = 112 in the decimal system) -e cs:100 ; enter the hand code D40C at the default assembly ; address 100H (in debug) in the code segment (cs)1377:0100 07.d4 BB.c

-u 100 101 ; unassemble the first 2 bytes of code

41

1377:0100 D40C AAM 0C ; Note how the hand coded instruction ; gets unassembled, but if we try to ; give this as an instruction, AAM 0C ; it will produce an error in debug or ; when assembled in MASM.

-r ; display register contentsAX=2070 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 D40C AAM 0C -tAX=0904 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 NV UP EI PL NZ NA PO NC -q ; quit debug

Exercise: study the regular AAM instruction (D4 0C) in the debug.Note: This AAM instruction could be used for getting 2 digit unpacked BCD in register AX, from 2-digit packed hex number, less than 64 H (= 100 decimal) in register AL. Using general form of the hand coded instruction it is possible to apply this to general base conversion. You may also check what happens when we use this instruction with, say, FF H in register AL.

• AAD: ASCII adjust AX before division:Adjusts two unpacked BCD digits (the least-significant digit in the AL register and the most significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the AX register by an unpacked BCD value. The AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then clears the AH register to 00H. The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit (base 10) number in registers AH and AL.The generalized version of this instruction allows adjustment of two unpacked digits of any number base (see the “Operation” section below), by setting the imm8 byte to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAD mnemonic is interpreted by all assemblers to mean adjust ASCII (base 10) values. To adjust values in another number base, the instruction must be hand coded in machine code (D5 imm8).Operation:tempAL ← AL;tempAH ← AH;AL ← (tempAL + (tempAH ∗ imm8)) AND FFH; (* imm8 is set to 0AH for the AAD mnemonic *)AH ← 0The immediate value (imm8) is taken from the second byte of the instruction.

42

Flags Affected: The SF, ZF, and PF flags are set according to the resulting binary value in the AL register; the OF, AF, and CF flags are undefined.Note: This instruction can be used to convert 2-digit unpacked BCD in AX to 2-digit packed hex in AL. The generalized hand coded version will be useful for doing the same in any base less than 16 decimal (Why less than 16? Try to reason out). Exercise: Check the regular and the hand coded versions in the debug.

Hand coding in the .asm file is demonstrated below.HAND CODING IN THE .ASM FILEcode segmentassume cs:codestart: mov ax,050AH dw 0BD5H ; hand coded AAD instruction, with 0B or ; 11 decimal after code D5 ; note the instruction word is D50B ; but loaded in memory with LS byte first. ; the base of conversion is now 0B or 11 decimal int 01 ; return control to debug Code ends end start

The above file can now be assembled and linked using masm and link programs to produce a .exe file which can be executed and seen in the debug environment, as demonstrated below. -u 0 6 ; unassemble the first 6 bytes of the code segment13D5:0000 B80A05 MOV AX,050A 13D5:0003 D50B AAD 0B 13D5:0005 CD01 INT 01 -r : display registersAX=0000 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 B80A05 MOV AX,050A -t2 ; trace execution of two instructionsAX=050A BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL NZ NA PO NC 13D5:0003 D50B AAD 0B AX=0041 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0005 NV UP EI PL NZ AC PE NC 13D5:0005 CD01 INT 01 ; not executed ;The hex value of the data 5A in Base 11 can be verified to be 41 in hex, ;validating the result seen in the register AX. -q ; return to DOS

Question: How is ASCII involved in AAM and AAD instructions? [Hint: Two-digit unpacked BCD in a 16-bit register can be easily converted to two ASCII characters of the decimal digits. Check how.]

4. Logical Instructions: The logical instructions perform the basic logical operations AND, OR, NOT and EX-OR on bytes or words in a bitwise fashion. There is a further instruction TEST, which can be thought of as logical compare.

43

This instruction does a bitwise AND of the two operands, but does not place the result in the destination register. The nature of the result goes to the flag register.There are many logical operations available. How is it, these four functions: AND, OR, NOT and EX-OR only are chosen? It can be seen that the functions provide the programmer, with a capability to handle individual bits of a word selectively. Consider we want to selectively set bit 4 from the left in a byte. We can use a mask with a 1 on the 4th bit from left and 0 in all other bits, the mask will then be 0000 1000. If we OR the data byte with this mask, we see that no other bit is changed, but the 4th bit is set irrespective of the condition of that bit in the original data byte. Similarly AND can be used to selectively clear a specific bit irrespective of its original condition. The mask required will be the complement of the mask we used for ORing above. A 1 will not alter a data on ANDing, but a 0 will clear the data when ANDed. An EX-OR will be similarly useful for selective toggling of the data. A 1 will toggle the data but a 0 will not when EX-ORed. NOT will be useful for finding the 1’s complement of a full data word. The logic function group AND, OR, NOT will form a universal logic group, which means, any logic function could be generated using these three functions appropriately on a bitwise basis, and hence no further logic functions will be needed. The EX-OR function will also be useful in data comparisons also. If we EX-OR two bytes or words, the result will be complete zero (every bit is zero and the zero flag will be set to indicate this condition clearly), when the two data bytes or words are equal. With this introduction we will now look at the four instructions in detail.

• AND: Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0.

Operation:DEST ← DEST AND SRC;Flags Affected:The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

• OR: Performs a bitwise inclusive OR operation between the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result of the OR instruction is set to 0 if both corresponding bits of the first and second are 0; otherwise it is set to 1.

Operation:DEST ← DEST OR SRC;Flags Affected:

44

The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

• XOR: Performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is 1 if the corresponding bits of the operands are different; each bit is 0 if the corresponding bits are the same.

Operation:DEST ← DEST XOR SRC;Flags Affected:The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

• NOT: Performs a bitwise NOT operation (each 1 is set to 0, and each 0 is set to 1) or does 1’s complementing on the destination operand and stores the result in the destination operand location. The destination operand can be a register or a memory location.

Operation:DEST ← NOT DEST;Flags Affected:None.

Exercise on Logical instructions: The register AX has some unknown data. Give a single instruction that will produce a 1 in the 4th and the 12th

bit from the left in AX without altering the other bits. If the mask used in the above case is used with AND instruction, what will happen to the data in AX?Give an instruction using XOR logic that will produce the same result on a register as the NOT instruction does.

• TEST: Bitwise AND the two sources operands, ignore the outcome, but preserve the nature of the result in the flag register.This instruction computes the bit-wise logical AND of first operand (source 1 operand) and the second operand (source 2 operand) and sets the SF, ZF, and PF status flags according to the result. The result is then discarded.

Operation:TEMP ← SRC1 AND SRC2;SF ← MSB(TEMP);IF TEMP = 0THEN ZF ← 1;ELSE ZF ← 0;PF ← Parity of the lower 8-bits of TEMP;CF ← 0;OF ← 0;(*AF is Undefined*)Flags Affected

45

The OF and CF flags are set to 0. The SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

5. Shift and Rotate Instructions: Shift and rotate instructions shift the data by one or more bits towards either left or right, straight or in a circular fashion. The carry flag is always involved in these operations. There are in all 3 shift instructions and 4 rotate instructions. The shift/ rotate counts can be a single bit denoted as such or multi bits based on the contents of register CL. The 8086 processor performs multi bit shifts as per the data in CL register completely, taking 4 clocks for each bit shift. The upper end processors starting from 286 onwards mask the upper 11 bits and use only the last 5 bits as specifying the shift count. These processors also permit multi bit shift counts to be specified as an immediate data in the instruction, while 8086 allows only single bit shift to be directly specified in the instruction. SHL BX, 15 H is an invalid instruction in 8086 (only SHL BX, 1 is valid), but valid in other higher end processors starting from 80286. We will now go to the details.

• SAL/SHL/SAR/SHR: The shift instructions, although shown with four separate mnemonics, are only three separate instructions. SAL and SHL are the same, but SAR and SHR are not so. (The debug will only accept the code SHL and indicates a fault on SAL. But MASM accepts both and produces the same code for both.) These instructions shift the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted beyond the destination operand boundary are first shifted into the CF flag, and then discarded. At the end of the shift operation, the CF flag contains the last bit shifted out of the destination operand. The destination operand can be a register or a memory location. The count operand can be the immediate value of 1, or it can be any 8-bit value in register CL for multiple shifts. The shift arithmetic left (SAL) and shift logical left (SHL) instructions perform the same operation; they shift the bits in the destination operand to the left (toward more significant bit locations). For each shift count, the most significant bit of the destination operand is shifted into the CF flag, and the least significant bit is cleared. The shift arithmetic right (SAR) and shift logical right (SHR) instructions are different instructions as described below. They do the right shift of the bits of the destination operand (toward less significant bit locations). For each shift count, the least significant bit of the destination operand is shifted into the CF flag, and the most significant bit is either set or cleared depending on the instruction type. The SHR instruction clears the most significant bit, and the SAR instruction sets or clears the most significant bit to correspond to the sign (most significant bit) of the original value in the destination operand. In effect, the SAR instruction fills the empty bit position’s shifted value with the sign of the unshifted value. The SAR and SHR instructions can be used to perform signed or unsigned division, respectively, of the destination operand by powers of 2. For example, using the SAR instruction to shift a signed integer 1 bit to the

46

right divides the value by 2. Using the SAR instruction to perform a division operation does not produce the same result as the IDIV instruction. The quotient from the IDIV instruction is rounded toward zero, whereas the “quotient” of the SAR instruction is rounded toward negative infinity. This difference is apparent only for negative numbers. For example, when the IDIV instruction is used to divide -9 by 4, the result is -2 with a remainder of -1. If the SAR instruction is used to shift -9 right by two bits, the result is -3 and the “remainder” is +3; however, the SAR instruction stores only the most significant bit of the remainder (in the CF flag). The OF flag is affected only on 1-bit shifts. For left shifts, the OF flag is set to 0 if the most significant bit of the result is the same as the CF flag (that is, the top two bits of the original operand were the same); otherwise, it is set to 1. For the SAR instruction, the OF flag is cleared for all 1-bit shifts. Execution of the SHR instruction, sets the OF flag to correspond to the most-significant bit of the original operand.(The use of the OF flag in these instructions is not indicated in the manual.)

Exercises: Study the case of dividing -9 by +4 using SAR in the debug environment, and compare it with the operation using the IDIV instruction. Can you think of a way of getting the full remainder +3 on dividing -9 in register AX by 4, using the SAR instruction? [Hint: Try using SAR AX, 1 twice, instead of using SAR AX, CL with 2 in CL and manage to get the full remainder using the carries of the two operations]Do you think the SHL instruction can do multiplication by powers of 2? Explain.Demonstrate in the debug environment, that a register can be cleared using the SHR or SHL instructions. [Answer: See the following program.]

-a 1377:0100 mov ax, 1234 ; any random data in AX1377:0103 mov cl, 12 ; shift count in CL is 18 decimal or 12 hex.1377:0105 shl ax, cl ; left shift1377:0107 mov ax, 5678 ; fresh random data loaded in AX1377:010A shr ax, cl ; right shift by 18 bits

- u 100 10b1377:0100 B83412 MOV AX,1234 1377:0103 B112 MOV CL,12 1377:0105 D3E0 SHL AX,CL 1377:0107 B87856 MOV AX,5678 1377:010A D3E8 SHR AX,CL -rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 B83412 MOV AX,1234 -t5

47

AX=1234 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PO NC 1377:0103 B112 MOV CL,12 AX=1234 BX=0000 CX=0012 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0105 NV UP EI PL NZ NA PO NC 1377:0105 D3E0 SHL AX,CL AX=0000 BX=0000 CX=0012 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0107 NV UP EI PL ZR NA PE NC 1377:0107 B87856 MOV AX,5678 ;NOTE: Shift count is more than 16 and hence 16 0’s are shifted into AX.AX=5678 BX=0000 CX=0012 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=010A NV UP EI PL ZR NA PE NC 1377:010A D3E8 SHR AX,CL AX=0000 BX=0000 CX=0012 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=010C NV UP EI PL ZR NA PE NC -q

• RCL, RCR, ROL, and ROR: Rotate instructions, rotate including carry (RCL, RCR) and rotate only the register (ROR, ROL): These instructions shift (rotates) the bits of the first operand (destination operand) the number of bit positions specified in the second operand (count operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the count operand is either the immediate value 1 or a value in the CL register. The rotate left (ROL) and rotate through carry left (RCL) instructions shift all the bits toward more-significant bit positions, except for the most-significant bit, which is rotated to the least significant bit location. The rotate right (ROR) and rotate through carry right (RCR) instructions shift all the bits toward less significant bit positions, except for the least-significant bit, which is rotated to the most-significant bit location. The RCL and RCR instructions include the CF flag in the rotation. The RCL instruction shifts the CF flag into the least-significant bit and shifts the most-significant bit into the CF flag. The RCR instruction shifts the CF flag into the most-significant bit and shifts the least-significant bit into the CF flag. For the ROL and ROR instructions, the original value of the CF flag is not a part of the result, but the CF flag receives a copy of the bit that was shifted from one end to the other.The OF flag is defined only for the 1-bit rotates; it is undefined in all other cases (except that a zero-bit rotate does nothing, that is affects no flags). For left rotates, the OF flag is set to the exclusive OR of the CF bit (after the rotate) and the most-significant bit of the result. For right rotates, the OF flag is set to the exclusive OR of the two most-significant bits of the result.The 8086 does not mask the rotation count. However, all other IA-32 processors (starting with the Intel 286 processor) do mask the rotation count to 5 bits, resulting in a maximum count of 31. This masking is done

48

in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.The SF, ZF, AF and PF flags are not affected by the rotate instructions.Exercises: Study the rotate instructions in the debug.For the 8086 processor show that in case of ROL and ROR instructions, the result of rotations, using CL register for shift count, is independent of the upper nibble of CL. This upper nibble only increases the execution time of the instruction.

6. Control Transfer Instructions: The intelligence in any program lies in the ability of the program to follow different courses of action based on intermediate results produced during the working of the program; that way, the program is enabled to perform data sensitive tasks. This capability is obtained by context sensitive jump operations, in contrast to the normal sequential cyclical operation of fetching the next instruction and executing it, in the order in which it is found in the program. So this set of control transfer instructions to be discussed now, give the processor all its intelligence and its raw power. The nature of the data is determined by the intermediate results that we get. It should be noted that when we speak of the nature of the data or the nature of partial results, we do not exactly mean the value of the data, but its nature, whether it is positive, negative or zero, or whether it is larger or smaller than another data and so on. The purpose of the flag register is to keep track of this sort of information on the intermediate results and it is natural that the transfer of control must depend heavily on the flag register as we may see from the details of these instructions. We will look at these instructions now one by one.

• JMP: Jump instruction: Transfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location. This instruction can be used to execute three different types of jumps:Near jump—A jump to an instruction within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intra segment jump.Short jump—A near jump where the jump range is limited to –128 to +127 from the current IP value.Far jump—A jump to an instruction located in a different segment than the current code segment, is sometimes referred to as an inter segment jump.Near and Short Jumps: When executing a near jump, the processor jumps to the address (within the current code segment) that is specified with the target operand. The target operand specifies either an absolute offset (that is an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the IP register). A near jump to a relative offset of 8-bits (rel8) is

49

referred to as a short jump. The CS register is not changed on near and short jumps.An absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16), or it may directly be specified in the instruction. The following is a study of jump instructions in the debug environment.

-a1377:0100 jmp 112 ; coded as short relative 1377:0102 jmp 1234 ; coded as relative – but 16 bit displacement 1377:0105 jmp bx ; register direct; bx has the address 1377:0107 jmp [bx] ; register indirect (near) jump1377:0109 jmp wordptr [bx]; same as above1377:010B jmp dwordptr[bx]; interpreted same as above? See unassembly.1377:010D jmp far[bx] ; register indirect far jump1377:010F jmp near [bx] ; register indirect near jump1377:0111 jmp far bx ; error, as far jump requires 32 bits of ^ Error ; address, while BX can only store 16 bits. 1377:0111 jmp far [1234] ; far jump to address @ DS:12341377:0115-u100 1141377:0100 EB10 JMP 0112 ; near, short, rel-8 address 1377:0102 E92F11 JMP 1234 ; near, rel-16 address 1377:0105 FFE3 JMP BX ; BX IP 1377:0107 FF27 JMP [BX] ; memory at address in BX IP 1377:0109 FF27 JMP [BX] 1377:010B FF27 JMP [BX] ; interpreted only as near jump? 1377:010D FF2F JMP FAR [BX] ; memory dword @ [BX] CS:IP 1377:010F FF27 JMP [BX] ; memory word @ [BX] IP 1377:0111 FF2E3412 JMP FAR [1234]; memory word @ [1234] CS:IP

• Conditional Jump Instructions: There are several conditional jump instructions as shown in the table below with conditions described either in terms of the flags directly, like JC (jump if carry); alternatively the conditions may be described in terms of problem requirements like JA (jump if above), which in terms of the flags would be a little involved. It would cause a jump only if both carry and the zero flags are both reset. The instructions and their operations are detailed below. The abbreviation cb stands for code byte which gives the jump target address relative to the IP address of the instruction following the jump instruction, and the instruction requires this relative address in the range -128 to +127, so that it can fit in the code byte indicated as cb in the opcodes field below, and represented as rel8 in the table below. It should also be noted that the terms ‘above’ and ‘below’ refer to the comparison of two data, considering them as unsigned integers, while the words ‘greater’ and ‘less’ correspond to the signed integer data comparison. These conditional jump instructions are normally used after subtract or compare instructions. 8086 processor does not have any near or far jumps other than the short relative jumps mentioned above. You may also note that in the table below all opcodes start with hex digit 7(excepting the JCXZ which is anyway a special instruction) so in all there can only be 16 instructions possible, but the table shows 30 entries excluding the JCXZ entry. This means there 14 instructions having duplicate mnemonics, and 2

50

instructions having single mnemonic, identify equivalent mnemonics and check that they represent the same condition. Identify also opcodes which are having a single mnemonic.

Opcode Instruction Description1. 77 cb JA rel8 Jump short if above (CF=0 and ZF=0)2. 73 cb JAE rel8 Jump short if above or equal (CF=0)3. 72 cb JB rel8 Jump short if below (CF=1)4. 76 cb JBE rel8 Jump short if below or equal (CF=1 or ZF=1)5. 72 cb JC rel8 Jump short if carry (CF=1)6. E3 cb JCXZ rel8 Jump short if CX register is 07. 74 cb JE rel8 Jump short if equal (ZF=1)8. 7F cb JG rel8 Jump short if greater (ZF=0 and SF=OF)9. 7D cb JGE rel8 Jump short if greater or equal (SF=OF)10. 7C cb JL rel8 Jump short if less (SF ≠ OF)11. 7E cb JLE rel8 Jump short if less or equal (ZF=1 or SF<>OF)12. 76 cb JNA rel8 Jump short if not above (CF=1 or ZF=1)13. 72 cb JNAE rel8 Jump short if not above or equal (CF=1)14. 73 cb JNB rel8 Jump short if not below (CF=0)15. 77 cb JNBE rel8 Jump short if not below or equal (CF=0 and ZF=0)16. 73 cb JNC rel8 Jump short if not carry (CF=0)17. 75 cb JNE rel8 Jump short if not equal (ZF=0)18. 7E cb JNG rel8 Jump short if not greater (ZF=1 or SF ≠ OF)19. 7C cb JNGE rel8 Jump short if not greater or equal (SF ≠ OF)20. 7D cb JNL rel8 Jump short if not less (SF=OF)21. 7F cb JNLE rel8 Jump short if not less or equal (ZF=0 and SF=OF)22. 71 cb JNO rel8 Jump short if not overflow (OF=0)23. 7B cb JNP rel8 Jump short if not parity (PF=0)24. 79 cb JNS rel8 Jump short if not sign (SF=0)25. 75 cb JNZ rel8 Jump short if not zero (ZF=0)26. 70 cb JO rel8 Jump short if overflow (OF=1)27. 7A cb JP rel8 Jump short if parity (PF=1)28. 7A cb JPE rel8 Jump short if parity even (PF=1)29. 7B cb JPO rel8 Jump short if parity odd (PF=0)30. 78 cb JS rel8 Jump short if sign (SF=1)31. 74 cb JZ rel8 Jump short if zero (ZF = 1)The following examples in debug show sample instructions also with unassembly :-a1377:0100 ja 1141377:0102 jnb 1234 ^ Error; relative address more than 8 bits1377:0102 jae 851377:0104 -u 100 1031377:0100 7712 JA 0114 ; address relative to 102H is 12H 1377:0102 7381 JNB 0085 ; address relative to 104H is 81H ; or -7F H 1377:0104 -q

51

Exercise: All the conditional jumps are only possible with displacements in the range -128 to +127 from the current location. If a longer range of conditional jump is required how can you arrange for that? [Hint: try using a simple jump with a longer or 16-bit relative address (in addition to the conditional jump) – at the destination of the conditional jump]

• LOOP, LOOPZ (LOOPE) and LOOPNZ (LOOPNE): These are unconditional and conditional Loop instructions. Note that there are only two conditional loops, both based on the condition of the zero flag. Loop on zero (or Loop on equal) and Loop on not zero (or loop if unequal, that is when the comparison of 2 data items show that they are unequal)

Description:The loop instruction performs a loop operation using the CX register as a counter. Each time the LOOP instruction is executed, the count register is decremented, then checked for 0. If the count is 0, the loop is terminated and program execution continues with the instruction following the LOOP instruction. If the count is not zero, a near jump is performed to the destination (target) operand, which is presumably the instruction at the beginning of the loop. If the address-size attribute is 32 bits, the ECX register is used as the count register; otherwise the CX register is used. The target instruction is specified with a relative offset (a signed offset relative to the current value of the instruction pointer in the IP register). This offset is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 8-bit immediate value, which is added to the instruction pointer. Offsets of –128 to +127 are allowed with this instruction.Conditional loop instructions (LOOPcc) accept the ZF flag as a condition for terminating the loop before the count reaches zero. With these forms of the instruction, a condition code (cc) is associated with each instruction to indicate the condition being tested for. Here, the LOOPcc instruction itself does not affect the state of the ZF flag; the ZF flag is changed by other instructions in the loop. Loopz stands for loop if zero, loopnz for loop if not zero. Opcode Instruction DescriptionE2 cb LOOP rel8 Decrement count; jump short if count ≠ 0E1 cb LOOPE rel8 Decrement count; jump short if count ≠ 0 and ZF=1E1 cb LOOPZ rel8 Decrement count; jump short if count ≠ 0 and ZF=1E0 cb LOOPNE rel8 Decrement count; jump short if count ≠ 0 and ZF=0E0 cb LOOPNZ rel8 Decrement count; jump short if count ≠ 0 and ZF=0

• CALL: Call instruction is a returnable jump to the destination or the target address provided in the instruction. This instruction can be used to execute two different types of calls:Near call—A call to a procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment call.Far call—A call to a procedure located in a different segment than the current code segment, sometimes referred to as an intersegment call.Near Call: When executing a near call, the processor pushes the value of the IP register (which contains the offset of the instruction following the CALL instruction) onto the stack (for use later as a return-instruction pointer). The processor then branches to the address in the current code

52

segment specified with the target operand. The target operand specifies either an absolute offset in the code segment (that is an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the IP register, which points to the instruction following the CALL instruction). The CS register is not changed on near calls.For a near call, an absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16 ). Absolute offsets are loaded directly into the IP register. (When accessing an absolute offset indirectly using the stack pointer [SP] as a base register, the base value used is the value of the SP before the instruction executes.)Far Calls: When executing a far call, the processor pushes the current value of both the CS and IP registers onto the stack for use as a return-instruction pointer. The processor then performs a “far branch” to the code segment and offset specified with the target operand for the called procedure. Here the target operand specifies an absolute far address either directly with a pointer (ptr16:16) or indirectly with a memory location (m16:16 ). With the pointer method, the segment and offset of the called procedure is encoded in the instruction, using a 4-byte far address immediate. With the indirect method, the target operand specifies a memory location that contains a 4-byte far address. The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared.Exercises: In the debug, check and see how the following instructions are machine coded on unassembling: CALL 156

CALL SHORT 156CALL 6789CALL BXCALL SHORT BXCALL [BX]CALL NEAR [BX]CALL SHORT [BX]CALL FAR [BX]CALL FAR 1234:5678CALL SPCALL [AX]

That will give you a fair idea of the machine codes used, as well as of the different modes of call instructions. However, by far the most common method used for call is by directly giving the address of the procedure in the instruction. In ALP (assembly language programming) this is done by using the label name used for the procedure or subroutine.

53

• RET: The RET (return) instruction returns control back from the procedure to the program that has called the procedure. The control will be returned to the instruction following the procedure call. This instruction transfers the program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction. The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. Exercises: Study following RET instructions in the debug and see their machine codes.

RETRET NEARRETF ; this stands for return farRET 120RETF 120

• INTn, INTO and INT 3: These instructions are software interrupt procedure calls. Software interrupts are special procedures that can be invoked or called using an 8-bit number, known as the interrupt number. Many system services are rendered using software interrupts. The interrupt invoked procedures are normally known as interrupt service routines. I/O devices also can obtain system services using these calls. They first get the attention of the processor by activating the interrupt pin of the processor. When the pin is activated, the processor goes through a sequence of operations to which the interrupting I/O device responds by inputting an 8-bit number n. The processor then invokes the service routine for INT n. This process is known as the hardware interrupt operation. Once an interrupt is invoked, the processor pushes the FLAGS, the CS and the IP value (corresponding to the instruction immediately following the interrupt call). With this, the processor is ready to accept a returnable far jump (new values in CS and IP, returnable because the old values of CS and IP are stored in the stack along with the old flags). The destination operand n in the instruction specifies an interrupt vector number from 0 to 255, encoded as an 8-bit unsigned intermediate value. Each interrupt vector number n provides an index to a 4-byte array – the interrupt vector array – storing the far call address associated with the particular n. In all, there is provision for 256 interrupts, and with each interrupt having a 4 byte address (far call address CS and IP), the interrupt vector array is placed in the lowest 1 KB of the memory space. The first 32 interrupt vector numbers are reserved by Intel for system use. Some of these interrupts are used for internally generated exceptions.The INT n instruction is the general mnemonic for executing a software-generated call to an interrupt handler, with the vector number n. The INTO instruction is a special mnemonic for calling overflow exception, interrupt vector number 4. The overflow interrupt checks the

54

OF flag in the FLAGS register and calls the overflow interrupt handler, that is, the interrupt with a vector number 4, if the OF flag is set to 1.The INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the debug exception handler. This one byte form is valuable because it can be used to replace the first byte of any instruction with a breakpoint, including other one byte instructions, without over-writing other code. Exercise: 1. Unassemble the following instructions in the debug:

INTOINT 3INT 4INT 73

2. Although INTO and INT 4 appear to be the same, INTO is a conditional execution of vector 4 interrupt based on the overflow flag, but INT 4 is unconditional software interrupt at vector 4, occurring even without there being an overflow, as seen in the following debug experiment. The experiment is done without the OF being reset, and as can be seen in the execution, INT 4 actually branches to the interrupt routine, while INTO does not. After setting the OF, executing INTO invokes the interrupt at vector 4.

-a1377:0100 int 41377:0102 into1377:0103 -d 0000:0010 0013 ;the data below shows the interrupt 4 vector0000:0010 8B 01 70 00 ..p.-rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 CD04 INT 04 -t

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=0070 IP=018B NV UP DI PL NZ NA PO NC 0070:018B 1E PUSH DS; INT 4 has taken place even with ; OF being not there, NV = no overflow. -ripIP 018B:102-rcsCS 0070:1377 ; get back to cs:ip = original 1377:102 (to INTO instruction)-rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 NV UP DI PL NZ NA PO NC 1377:0102 CE INTO -tAX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000

55

DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP DI PL NZ NA PO NC 1377:0103 0000 ADD [BX+SI],AL ; INTO has not occurred, as ;OF is not set, NV = no o’flow.

-ripIP 0103:102 ; again get back to INTO instruction-rfNV UP DI PL NZ NA PO NC – ov ; set overflow flag-rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 OV UP DI PL NZ NA PO NC 1377:0102 CE INTO -t

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=0070 IP=018B OV UP DI PL NZ NA PO NC 0070:018B 1E PUSH DS; with OF set, interrupt 4 has occurred.

Exercise: check if the codes CD 03 and CC, both standing for interrupt 3 have any difference in execution. [Hint: the code CC is only a convenience for debug purposes, for break point provision]

• IRET: Return from interrupt: the IRET instruction performs a far return to the interrupted program or procedure. During this operation, the processor pops the return instruction pointer, return code segment selector, and FLAGS image from the stack to the IP, CS, and FLAGS registers, respectively, and then resumes execution of the interrupted program or procedure.Exercise: Why should the flag be saved at entry to the interrupt service routine and why should it be retrieved on return? What about other registers used by the interrupt routine? How are their integrity maintained on return? [Hint: It is the responsibility of the interrupt program to return them intact]Why are there no instructions like IRET NEAR, IRETF or IRET n? [Hint: consider hardware interrupts by I/O devices]

7. String Instructions: String instructions operate on strings of bytes or words allowing them to be moved between memory and register or memory and memory. There are several instructions in this category, involving comparison of memory data with AX or AL, comparison of two data arrays, handling an array of input from an input device, outputting a data array through an output device and so on, as we shall see from the instruction details below. When memory arrays are used, the source array is addressed using the address DS:SI,(but with segment over ride prefix ES:, it can be ES: SI), and destination address is always ES:SI (cannot be over-ridden). The DI and SI values will change at every execution of the instruction so that these addresses point to the next data element of the relevant arrays. The next address of the data element may be in the

56

upward direction or downward direction (SI, DI increasing or decreasing by 2 or 1, depending on word or byte operation). When the direction flag D is 0, the upward direction (address increasing) is taken. When it is 1, the downward direction (address decreasing) is taken for address modification. The direction flag can be controlled by the instruction CLD (clear direction flag D) or STD (set direction flag D). These instructions can be used with REP prefix to repeat a certain number of times as per the array length. The array length should be in CX before invoking the REP action.

• MOVSB, MOVSW: String byte, string word move. It is only in these two instructions and the two CMPS instructions discussed next, that we can have both source and destination operands in memory, although implicitly specified. All other instructions will have at least one operand specified by a register. The move here is the command to enable to copy (a string of bytes or a string of words) like move elsewhere. These instructions move the byte or word, specified with the second operand (source operand) to the location specified with the first operand (destination operand). Both the source and destination operands are located in memory. The address of the source operand is read from the DS:SI registers. With segment override prefix ES: the source is ES:SI. The address of the destination operand is always from memory at ES:DI. Note that these operands are not explicitly mentioned in the instruction, but implied. The instructions are just MOVSB, MOVSW all by themselves. After the data move, the addresses in SI and DI are appropriately modified depending on the D flag and on whether it is a byte or a word move. What makes it necessary to use up or down addressing? Study the debug experiment below.

-e 250 ;enter the source array values at address DS:250 onwards1377:0250 74.1 03.2 E9.3 6A.4 FF.5 B8.; source array is: 01 02 03 04 05 -a ; assemble at 1001377:0100 mov cx,5 ; number of bytes to be transferred1377:0103 mov si, 250 ; Source array start1377:0106 mov di, 252 ; Destination array start 1377:0109 rep movsb ; repeat byte transfer cx times1377:010B ; assembly over -rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 B90500 MOV CX,0005 ;note, es and ds are both same. -t8AX=0000 BX=0000 CX=0005 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PO NC 1377:0103 BE5002 MOV SI,0250 AX=0000 BX=0000 CX=0005 DX=0000 SP=FFEE BP=0000 SI=0250 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 BF5202 MOV DI,0252

57

AX=0000 BX=0000 CX=0005 DX=0000 SP=FFEE BP=0000 SI=0250 DI=0252 DS=1377 ES=1377 SS=1377 CS=1377 IP=0109 NV UP EI PL NZ NA PO NC 1377:0109 F3 REPZ 1377:010A A4 MOVSB

AX=0000 BX=0000 CX=0004 DX=0000 SP=FFEE BP=0000 SI=0251 DI=0253 DS=1377 ES=1377 SS=1377 CS=1377 IP=0109 NV UP EI PL NZ NA PO NC 1377:0109 F3 REPZ 1377:010A A4 MOVSB AX=0000 BX=0000 CX=0003 DX=0000 SP=FFEE BP=0000 SI=0252 DI=0254 DS=1377 ES=1377 SS=1377 CS=1377 IP=0109 NV UP EI PL NZ NA PO NC 1377:0109 F3 REPZ 1377:010A A4 MOVSB AX=0000 BX=0000 CX=0002 DX=0000 SP=FFEE BP=0000 SI=0253 DI=0255 DS=1377 ES=1377 SS=1377 CS=1377 IP=0109 NV UP EI PL NZ NA PO NC 1377:0109 F3 REPZ 1377:010A A4 MOVSB AX=0000 BX=0000 CX=0001 DX=0000 SP=FFEE BP=0000 SI=0254 DI=0256 DS=1377 ES=1377 SS=1377 CS=1377 IP=0109 NV UP EI PL NZ NA PO NC 1377:0109 F3 REPZ 1377:010A A4 MOVSB AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0255 DI=0257 DS=1377 ES=1377 SS=1377 CS=1377 IP=010B NV UP EI PL NZ NA PO NC 1377:010B 0100 ADD [BX+SI],AX ; movsb not repeated -d 250 258 1377:250 01 02 01 02 01 02 01 00 – EB ......... ; note, the copying has not come out correct, can you figure out why?; [Hint: Look at the D flag. Work out how the program can be corrected]

• CMPSB, CMPSW: Compare string byte and compare string word. These instructions compare the byte or word, specified with the second operand (source operand) to the byte or word specified with the first operand (destination operand). Both the source and destination operands are located in memory. The address of the source operand is read from the DS:SI registers. The address of the destination operand is read from the ES:DI registers. Note that the operands are not explicitly mentioned in the instruction, but implied. The instructions are just CMPSB, CMPSW all by themselves. After the data compare, the addresses in SI and DI are appropriately modified depending on the D flag and on whether it is a byte or a word compare, exactly as in MOVSB and MOVSW instructions. The flag register is modified as per the results of comparison as would happen in a normal comparison of data. These compare string instructions can also take the REP prefix like all string move instructions (with CX initialized to the length of the array in bytes or words depending on the instruction and decremented with every repetition); only the repeat action becomes meaningless if it is unconditional. Two conditional repeats are therefore provided: REPZ (also the same as REPE) and REPNZ (same as REPNE) as prefixes to these instructions. REPZ, REPE will come out of the repeat loop when a mismatch occurs between the two array-elements compared, even though CX has not reached 0. REPNZ, REPNE will cease to repeat when a match occurs, between the array-elements

58

compared, even if CX has not reached 0. It may be noted that REP and REPZ or REPE have the same opcode. This means the REP, REPE, REPZ prefixes can be used with MOVSB or MOVSW and when so used, the zero flag will not be checked nor modified during execution, but the instruction will continue repeating until CX becomes 0. When the REPZ (REPE) or REPNZ (REPNE) prefixes are used with CMPSB or CMPSW, the flag register is not changed to correspond to the result of comparison, neither is the repeat action decided by the zero flag in the flag register, but the result of comparison is directly used for deciding to repeat comparing or not. However, at exit (may be because the condition is not satisfied or because CX has reached zero), the result of comparison from which the exit from the loop has occurred is seen in the zero flag. The following assembled program and the tracing of its execution in the debug, clearly brings out this fact.

It is also to be noted that only this cmps instruction and the scas instruction discussed next that distinguish between the two, repz and repnz prefixes. All other string instructions do not distinguish between the two prefixes.

STUDY OF REP CMPSB INSTRUCTION ASSEMBLY LANGUAGE PROGRAM:data segment cmpdt db 1,2,45,4,5,6,7,8, 1,2,3,4,5,6,7,8; 3rd data not matching; group 1 compared with group2 data endscode segmentassume cs:code, ds: data, es:data start: mov ax, data mov ds, ax mov es, ax ; initialize back2: lea si, cmpdt ; point to start of group 1 mov di, si add di, 5 ; point to start of group 2 cmp si, di ; reset zero flag Back1: mov cx, 5 rep cmpsb ; compare 5 bytes (the third is a mismatch) jnz back1 ; on mismatch, compare next 5 bytes which match ; completely jmp back2 ; start all over again (if you want another run) int 1 ; you won’t reach here, but just in case! code ends end start

; UNASSEMBLING IN THE DEBUG-u 0 1c ; unassemble from 0 to 1c13D6:0000 B8D513 MOV AX,13D5 13D6:0003 8ED8 MOV DS,AX 13D6:0005 8EC0 MOV ES,AX 13D6:0007 8D360000 LEA SI,[0000] 13D6:000B 39FE MOV DI,SI 13D6:000D 83C708 ADD DI,+08 13D6:0010 3BC0 CMP SI,DI 13D6:0012 B90500 MOV CX,0005 13D6:0015 F3 REPZ

59

13D6:0016 A6 CMPSB 13D6:0017 75F9 JNZ 0012 13D6:0019 EBEC JMP 0007 13D6:001B CD01 INT 01 -g 7 ; Execute upto instruction at cs:7 (excluding) AX=13D5 BX=0000 CX=002D DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0007 NV UP EI PL NZ NA PO NC 13D6:0007 8D360000 LEA SI,[0000] DS:0000=0201 ;explain the indication highlighted yellow above-d 0 f ; Group 1 Group 213D5:0000 01 02 2D 04 05 06 07 08-01 02 03 04 05 06 07 08 ..-.............-g 15 ; execute till CS:15AX=13D5 BX=0000 CX=0005 DX=0000 SP=0000 BP=0000 SI=0000 DI=0008 DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PE NC 13D6:0015 F3 REPZ ; next instruction to be executed 13D6:0016 A6 CMPSB -ta ; trace execution of next A hex (10) instructions

AX=13D5 BX=0000 CX=0004 DX=0000 SP=0000 BP=0000 SI=0001 DI=0009 DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PE NC 13D6:0015 F3 REPZ 13D6:0016 A6 CMPSB AX=13D5 BX=0000 CX=0003 DX=0000 SP=0000 BP=0000 SI=0002 DI=000A DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PE NC 13D6:0015 F3 REPZ 13D6:0016 A6 CMPSB AX=13D5 BX=0000 CX=0002 DX=0000 SP=0000 BP=0000 SI=0003 DI=000B DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0017 NV UP EI PL NZ NA PO NC 13D6:0017 75F9 JNZ 0012; first exit from the loop (NZ) ; Note: result of the comparison is available in the zero flag at exit from the ; loop.AX=13D5 BX=0000 CX=0002 DX=0000 SP=0000 BP=0000 SI=0003 DI=000B DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0012 NV UP EI PL NZ NA PO NC 13D6:0012 B90300 MOV CX,0005 AX=13D5 BX=0000 CX=0005 DX=0000 SP=0000 BP=0000 SI=0003 DI=000B DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PO NC 13D6:0015 F3 REPZ 13D6:0016 A6 CMPSB AX=13D5 BX=0000 CX=0004 DX=0000 SP=0000 BP=0000 SI=0004 DI=000C DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PO NC 13D6:0015 F3 REPZ 13D6:0016 A6 CMPSB AX=13D5 BX=0000 CX=0003 DX=0000 SP=0000 BP=0000 SI=0005 DI=000D DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PO NC 13D6:0015 F3 REPZ 13D6:0016 A6 CMPSB AX=13D5 BX=0000 CX=0002 DX=0000 SP=0000 BP=0000 SI=0006 DI=000E DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PO NC 13D6:0015 F3 REPZ 13D6:0016 A6 CMPSB

60

AX=13D5 BX=0000 CX=0001 DX=0000 SP=0000 BP=0000 SI=0007 DI=000F DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PO NC 13D6:0015 F3 REPZ 13D6:0016 A6 CMPSBAX=13D5 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=0008 DI=0010 DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0017 NV UP EI PL ZR NA PE NC 13D6:0017 75F9 JNZ 0012; second exit from loop (CX= 0); Note result from the comparison is put in the zero flag only at exit from the ; REP loop in both loop exit situations. WHY? What is your conclusion from the experiment? -q ; quit

• SCASB, SCASW: Scan string byte, scan string word: This instruction is the same as the earlier CMPSB, CMPSW instructions we saw in the previous section, except for the fact that the source for comparison is the register AL for SCASB, or AX for SCASW. The destination is the same, namely, ES:DI, and on execution, DI will point to the next byte or word, based on the instruction and the direction flag as in the earlier cases of MOVES and CMPS instructions. With CX initialized to the length of the destination array, and DI initialized to the array start address when D flag is reset, or to the end address of the array when the D flag is set, we can use the conditional instruction prefixes REPE (REPZ) or REPNE (REPNZ). The repetition will then go on until the condition gets contradicted or until the register CX reaches zero (that is, the destination array is completed). The zero flag in the flag register is updated only when the loop exits, exactly as we saw in case of the CMPS instruction.

• LODSB, LODSW: Load string byte, load string word: These instructions are similar to MOVSB, MOVSW except the destination of the move becomes AL for LODSB, and AX for LODSW. The source is DS:SI. On execution the data will come to the register AL or AX, and SI will be properly modified. The REP or REPE (REPZ) prefix may be used like in the MOVES instructions.

The example below shows that the repeat prefix produces the same result for this instruction whether it is used as REPE or as REPNE. (See

discussion in connection with the instruction CMPS)

THE .asm PROGRAMcode segmentassume cs:code Start: mov ax, cs mov ds, ax ; ds is made same as cs mov cx,6 mov si, offset array repe lodsb mov cx, 6 repne lodsb int 1 jmp start array db 00,11h,22h,33h,44h,55h,66h,77h,88h,99h,0aah,0bbh code ends end startTESTING IN THE DEBUG

61

-u 0 1413DC:0000 8CC8 MOV AX,CS 13DC:0002 8ED8 MOV DS,AX 13DC:0004 B90600 MOV CX,0006 13DC:0007 BE1500 MOV SI,0015 13DC:000A F3 REPZ 13DC:000B AC LODSB 13DC:000C B90600 MOV CX,0006 13DC:000F F2 REPNZ 13DC:0010 AC LODSB 13DC:0011 CD01 INT 01 13DC:0013 EBEB JMP 0000 -d cs:15 2013DC:0010 00 11 22-33 44 55 66 77 88 99 AA .."3DUfw...13DC:0020 BB .-rAX=0000 BX=0000 CX=0021 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=0000 NV UP EI PL NZ NA PO NC 13DC:0000 8CC8 MOV AX,CS -gAX=13BB BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=0021 DI=0000 DS=13DC ES=13CC SS=13DC CS=13DC IP=0013 NV UP EI PL NZ NA PO NC 13DC:0013 EBEB JMP 0000 -q

Exercise: From the example shown, try to prove that as far as executing the LODSW instruction is concerned, the prefix REPE behaves the same as the prefix REPNE, although they are coded differently; Intel literature gives only the prefix REPE for the purpose, which is the authentic coding for this repeat operation here.

• STOSB, STOSW: Store string word, store string word: These instructions are similar to the above, except here, the source is the accumulator, AL for byte store and AX for word store operations. The destination is ES:DI. The destination address modification after execution and repeat loop with CX initialized and using REP (REPE, REPZ) prefix will work like in LODS instructions. This can be used to initialize a memory block with a fixed data as shown in the assembly language program segment below:

MOV CX, 100HMOV DI, 500HMOV AX, 0REP STOSW

This program segment when executed will clear memory from address ES:500 to ES:7FF inclusive (100 words from address 500).

62

8. Flag Control Instructions: There are two types of instructions which control the flags. The first type controls specific flags, like the C flag, the D flag and the I flag. The other moves either the entire flag register or the lower significant byte of the register. The details are given below:

• STC, CLC and CMC: These instructions control the carry flag in the flag register. They stand for set carry, clear carry and complement carry. No other flags are affected by these instructions.

STC operation: CF ← 1;CLC operation: CF ← 0;CMC operation: CF ← NOT CF.

• STD and CLD: These instructions control the direction flag in the flag register. They stand for set and clear the direction flag. Other flags are not affected by these instructions. The need for controlling the D flag is already seen in connection with the string instructions.

STD operation: DF ← 1; enables string addresses to be decremented CLD operation: DF ← 0; enables string addresses to be incremented

• STI and CLI: These instructions modify the Interrupt control flag in the flag register. When this I flag is set, the processor is enabled to accept the hardware interrupts. Otherwise when it is reset, the processor will not be interrupted by activating the interrupt pin of the processor from the external hardware. Software interrupts are not disabled by clearing the I flag, as the debug experiment below shows. Other flags are not affected by these instructions.STI operation: IF ← 1; Hardware interrupts enabled. CLI operation: IF ← 0; Hardware interrupts disabled.The debug experiment:

-a ; assemble at 100 onwards1377:0100 cli1377:0101 int 201377:0103 -u 100 102; unassemble 100 to 1021377:0100 FA CLI 1377:0101 CD20 INT 20 -rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 FA CLI -t2AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0101 NV UP DI PL NZ NA PO NC 1377:0101 CD20 INT 20 AX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=00A7 IP=1072 NV UP DI PL NZ NA PO NC 00A7:1072 90 NOP ; INT 20 vector address: segment = 0000, offset = 80-83. -d 0000:80 83 0000:0080 72 10 A7 00 ; It is clear that interrupt is invoked even with ; the interrupt disabled, by clearing the I flag.

63

• LAHF: Load flag register (lower byte) to register AH. This is flag register move instruction

Description:Moves the low byte of the EFLAGS register (which includes status flags SF, ZF, AF, PF, andCF) to the AH register. Reserved bits 1, 3, and 5 of the EFLAGS register are set in the AH register as shown in the “Operation” section below.Operation:AH ← FLAGS (SF:ZF:0:AF:0:PF:1:CF);Flags Affected:None (that is, the state of the flags in the EFLAGS register is not affected).

• SAHF: Store the contents of AH register into the lower byte of Flag register.

Description:Loads the SF, ZF, AF, PF, and CF flags of the FLAGS register with values from the corresponding bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). Bits 1, 3, and 5 of register AH are ignored; the corresponding reserved bits (1, 3, and 5) in the FLAGS register remain as shown in the “Operation” section below.Operation:FLAGS (SF: ZF: 0: AF: 0: PF: 1: CF) ← AH;Flags Affected:The SF, ZF, AF, PF, and CF flags are loaded with values from the AH register. Bits 1, 3, and 5 of the EFLAGS register are unaffected, with the values remaining 1, 0, and 0, respectively.

• PUSHF and POPF: These instructions have already been discussed in connection with data transfer instructions (classified under type 1 instructions). It may be noted here that when we pop from the stack into the FLAG register, only the bits that represent the flags will be transferred, but the other bits (marked as don’t cares in the description of Flag register will not be altered). The debug program below indicates this feature of the flag register.

-a1377:0100 pushf ; flag register stack top1377:0101 pop ax ; stack top AX1377:0102 xor ax,f02a ; the non-flag bits are complemented in AX 1377:0105 push ax ; modified AX stack top1377:0106 popf ; and thence to the flag register1377:0107 pushf ; the modified flag register stack top1377:0108 popax ; and thence to the register AX1377:0109

-u 100 108

64

1377:0100 9C PUSHF 1377:0101 58 POP AX 1377:0102 352AF0 XOR AX,F02A 1377:0105 50 PUSH AX 1377:0106 9D POPF 1377:0107 9C PUSHF 1377:0108 58 POP AX -rAX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 9C PUSHF -t7AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0101 NV UP EI PL NZ NA PO NC 1377:0101 58 POP AX AX=3202 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 NV UP EI PL NZ NA PO NC 1377:0102 352AF0 XOR AX,F02A; original content of flags = 3202 ; XOR with 1111 0000 0010 1010 bin; the word chosen to complement the non-flag bits of the flag register; Note: Non-flag bits, specially, the bits of the M S nibble of the flag;reg were later used for identifying different x86 series of processors.;what we see in the MS nibble of flag register here is not 1111 which is ;the ID for 8086. Here, I have a Pentium mobile processor operating in ;the real mode. Hence the nibble here is an unchangeable 0011 or 3 hex. AX=C228 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0105 NV UP EI NG NZ NA PE NC 1377:0105 50 PUSH AX AX=C228 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI NG NZ NA PE NC 1377:0106 9D POPF AX=C228 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0107 NV UP EI PL NZ NA PO NC 1377:0107 9C PUSHF AX=C228 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 58 POP AX AX=3202 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0109 NV UP EI PL NZ NA PO NC; obviously the flag register has not changed even one bit. -q

9. Segment Register Instructions: Segment registers DS and ES can be loaded along with a pointer register by the instructions LDS and LES. Direct moves between segment registers is not allowed, for example, MOV ES, DS is invalid. Moves between a segment register DS, ES or SS and any one of the 8 registers AX, BX, CX, DX, SI, DI, BP and SP are permitted. If we wish to copy DS to ES,

65

we have to use one of these 8 registers as an intermediate register. MOV AX, DS followed by MOV ES, AX will be a valid operation for copying DS in ES.

• LDS and LES: LDS stands for Load DS and an address register indicated as the first operand in the instruction. LES stands for load ES and an address register indicated as the first operand for the instruction. The second operand is a memory pointer, where the far address of 4 bytes is stored. The following are examples of valid instructions.

LDS SI, [BX +1234] LES SP, [1234]; It may be noted this instruction is only given to ; indicate a very rare possibility, but may not be ; normally having any serious use other than for ; simultaneously loading both the registers ES ; and SP with a single instruction, provided the ; memory data is suitably manipulated for this ; requirement.

• CS:, DS:, ES: and SS: : These are segment override prefixes which we have already seen.

10. Miscellaneous Instructions: These instructions cannot be easily classified. These are LEA, NOP, HLT, WAIT and XLAT. The LOCK instruction prefix also can be considered here, as we have not found a place for it elsewhere. We shall now look at them one by one.

• LEA: Load effective address: This instruction computes the effective address of the second operand (the source operand) and stores it in the first operand (destination operand). The source operand is a memory address (offset part) specified with one of the processors addressing modes; the destination operand is one of the 8 registers AX, BX, CX, DX, SI, DI, BP or SP.Operation:DEST ← Effective Address (SRC);Flags affected: None.The instruction will not be very useful in the debug environment (why?), but in the assembly language programs it will be quite useful as the following program shows. Get the program assembled and linked. Test it in debug.

data segment addr dw 1234h, 5678h ; two words defined.data endscode segmentassume cs:code, ds:data start: mov ax, data mov ds, ax lea ax, addr ; on execution, the address of this data AX lea bx, addr+2 ; address of, or pointer to the next data BX mov cx, [bx] ; data word 5678h CX

mov bx, ax ; why is this necessary? mov dx, [bx] ; data word 1234h DX

int 01 code ends

end start

66

• NOP: No operation: This is a one byte instruction doing nothing except

incrementing the IP by 1. It can also be coded in the assembly language as XCHG AX, AX, which also does nothing really. The machine code for XCHG AX, AX and NOP are the same; both are 90 H.

• HLT: Go to the HALT state by stopping the cyclic fetch and execute operations of the processor. The processor will remain in this HALT state until this state is interrupted by a hardware activation (taken to logic high) on one of its pins: INTR (interrupt request), NMI (non maskable interrupt) or RESET.

• WAIT or FWAIT: Wait for Test signal or hardware interrupt – INTR, NMI or RESET. This instruction takes the processor to an idle state until one of the following happens: 1. The TEST pin of the processor goes low, or 2. INTR, NMI or RESET goes logic high. With test pin going low, the processor comes out of the wait state and proceeds normally. In case of accepting the interrupt, the processor executes the interrupt service routine and returns to the wait state again. The return address pushed to the stack is the address of the wait instruction itself, and not of the following instruction. This instruction is used for synchronizing the math or I/O coprocessor operations with the 8086.

• LOCK prefix: This instruction prefix is useful when there are instructions which do both read and write operation at a memory address while executing a single instruction. This happens when a memory address is used as the destination operand of an instruction; instructions like XCHG AX, [BX], INC or DEC a memory data etc. These instructions cause a memory read first at the destination register in the beginning, and at the end of execution of the instruction, there will be a memory write, writing back the result to the same memory location. When 8086 is used in a multi processor, parallel processing environment, it will be necessary to have these read and write operations to follow successively without allowing other processors to use the common data transfer bus. The bus is then supposed to be locked to the processor for the duration of the read followed by the write operation, in fact for the duration of the execution of the entire instruction. What the instruction actually does is to activate (drive to a logic low voltage) a processor signal at the LOCK pin of the processor, for the entire duration of the execution of the instruction prefixed by the LOCK prefix. The system bus should be designed to ensure that the data transfer bus – DTB – (the DTB consists of the lines handling the data, address and read/ write and other control lines associated with the transfer of data between the processor and other system units like memory etc.) control cannot be taken up by any other processor as long as the LOCK# (the # symbol is used to indicate an active low signal) remains activated. For example, if we have the instruction, LOCK INC [BX], the LOCK# pin of the processor will remain active all through the execution of this instruction. That is, no other processor will be able to access and control the DTB during the period of the reading of

67

the original data at the memory location at DS:BX, and subsequent write back of the incremented value to the same memory location, while executing this LOCK prefixed INC instruction. We can only say here that parallel processor systems do require this type of control. Exercise: Find out and list the type of instruction that can take a LOCK prefix. We have already indicated the instructions XCHG and INC/ DEC type. List other instruction types if any.

In this chapter, we have studied in detail, the instruction set of 8086. The instruction set of all the advanced Intel processors of the IA-32 Architecture, namely, 80x86 and the Pentium processors are all supersets of this basic set; any assembly language program (ALP) written using the instructions we have studied here, can normally be executed in these advanced processors. That is why it is very necessary to understand this instruction set very well if we are to work on these processors at the assembly level. In this chapter, the 8086 instruction set is studied to a sufficient depth, with examples of actual programs in the debug and .asm environments. This study is made in my system, which uses a Pentium mobile processor, working in the real 8086 mode. Advantages of assembly level working, we have already seen. Later chapters will give examples of ALPs at a serious level.

EXERCISES

1. Use the DEBUG to study the following instructions after loading different segment addresses in CS, DS, ES and SS: (i) mov [si], 58 followed by mov ax, [si]; (ii) stosw; (iii) mov [di], 54 (iv) mov ax, 24; (v) add ax, 38; (vi) daa; (vii) mov ax, [bp]; (viii) try another 5 instructions of your choice.

2. Write a program directly in the DEBUG to manipulate the unsigned data available in the registers ax, bx and cx so that ax has the largest of the three and cx has the smallest. Check the working of the program. What will be the modification required to the program if the data are considered as signed numbers?

3. Write a program directly in the DEBUG to shift the 32-bit data in registers dx:ax by one bit to the left. Check the working of the program.

4. Repeat the question 3 with these modifications: (i) one bit shift to the right; (ii) one bit rotate to the right; (iii) one bit rotate through carry to the right and (iv) one bit arithmetic shift to the right.

Below is shown a demonstration for the working for question 3.

-raxAX 0000:7777 ; initialize ax with 7777 hex (random data chosen for test)-rdxDX 0000:eeee ; and dx with EEEE hex; 32-bit data is now EEEE7777 hex-a ; assemble the program (2 instructions)

68

137B:0100 shl ax, 1137B:0102 rcl dx, 1 ; watch carefully the two instructions used.137B:0104 ; program over.-r ; examine the registers at the start of the execution.AX=7777 BX=0000 CX=0000 DX=EEEE SP=FFEE BP=0000 SI=0000 DI=0000 DS=137B ES=137B SS=137B CS=137B IP=0100 NV UP EI PL NZ NA PO NC 137B:0100 D1E0 SHL AX,1 -tAX=EEEE BX=0000 CX=0000 DX=EEEE SP=FFEE BP=0000 SI=0000 DI=0000 DS=137B ES=137B SS=137B CS=137B IP=0102 OV UP EI NG NZ NA PE NC 137B:0102 D1D2 RCL DX,1 -tAX=EEEE BX=0000 CX=0000 DX=DDDC SP=FFEE BP=0000 SI=0000 DI=0000 DS=137B ES=137B SS=137B CS=137B IP=0104 NV UP EI NG NZ NA PE CY 137B:0104 0000 ADD [BX+SI],AL DS:0000=CD; The shifted data is DDDCEEEE hex as seen from dx:ax now. -q


69

3. PROGRAMMING BASICS

Programming is a science and an art as well. Different people writing programs for a given problem at the assembly language level may write different programs altogether. The variations can be in several dimensions. There is some flexibility in allotting registers to the variables, there may be different algorithms available for solving a given problem, some more suitable in certain processors; depending on the processor, some problems may become very simple with certain tricky operations. Some may prefer to write simple programs, simple to understand but perhaps not very efficient; some others may revel in complex but efficient programming and so on.

What is it that we are looking for in a program; when can we say, a program is good? We should answer this question first, before attempting to write programs. We could put forth a few criteria for a good program. First, it must solve the given problem completely and for all sets of data. Sometimes the input to the program may not be a valid data, like an input of 0 for the divisor in a division program. In such cases, the program should exit giving an indication of the data invalidity. A good program must be easy to follow. Above all, it should be efficient in resource or register usage, efficient in terms of time of execution and efficient in terms of memory usage. In the context of continuously increasing memory and resource availability, it may look like the most important entity to be economized is the time of execution of the program. However, over indulgence in time optimization at the cost of simplicity may not be worthwhile. The way things are evolving, most of the features including the speed of systems are continuously improving and in this environment, a good program can be the one which is easier to understand and modify if needed, by any programmer with average expertise. This implies simplicity of the program may be a primary concern, more important than resource usage or the time taken for execution.

The following sets of alternative programs illustrate that even a simple function may be achieved in so many ways. Consider we wish to round off an eight bit number to only seven significant bits, that is, if the last bit is 0, we leave it unaltered, but if it is a 1, we increment the number so that the number will become the next number, approximately equal to the original eight bit number to seven bit accuracy. The following programs consider the number in AL register at input as well as after modification. They also do not use any other registers. Four possibilities (among several) are given below.

Alternative 1: ROR AL, 1ROL AL, 1; AL is unchanged, but carry has the l. s. bitJNC DOWNINC AL

DOWN:

Alternative 2: ROR AL, 1ROL AL, 1ADC AL, 0

69

Alternative 3: INC ALAND AL, 0FE h; kill the last bit after incrementing

Alternative 4: TEST AL, 01; Test does not destroy the data tested JZ DOWN

INC AL DOWN:

Alternative 5:uses register AH alsoMOV AH, 01AND AH, ALADD AL, AH

The fact, that even such a simple operation has so many possible ways of programming, indicates that programming is something where different individuals may come up with different versions for doing the same job. The programming language is thus quite flexible, almost similar to our normal languages like English.

We shall now take a little more serious problem, and see how we can program it in different styles, with different levels of goodness, or efficiency.

The problem we take is a 4-digit HEX to 5-digit BCD conversion. BCD to hex and hex to BCD conversions are useful in many situations. We understand BCD or decimal numbers better, but the processor is more at home with hex (actually binary, but binary is practically same as hex and we think of it as hex, for hex is more compact compared to binary). Because of this, at human-machine interaction level, this conversion from hex to BCD as well as BCD to hex will be necessary to make the systems user-friendly. So this program has a serious application.

Basics of number Base Conversions: There are two basic methods of number conversion from one base to another. The first method consists of separating the digits of the given number first, then multiplying the digits with the powers of the base and adding. Suppose we want to convert a hexadecimal number 12A to decimal. What we can do is to separate the digits 1, 2, and A, and multiply in decimal, each of these digits with the appropriate powers of 16 and add the results in decimal. Accordingly digit 1 is multiplied by 162 = 256 decimal to get 256, digit 2 is multiplied by 16 decimal to get 32 decimal, and the digit A, which is 10 decimal is multiplied by 1 to get 10. All these are now decimal added 256 + 32 + 10 to give the value 298 decimal for the number 12A hex. Horner’s rule can be used to simplify these calculations: 12A hex = (1*16 + 2)*16 + 10 decimal

= 18*16 + 10 = 288 + 10 = 298 decimal.

Note, in this method all calculations (multiplications and additions) are to be in decimal.

70

An alternative method to do this is to divide the hexadecimal number by 10 decimal, that is by 0A hex (using completely hex as the base of computation) successively to get the decimal digits as remainders every time and then putting these digits in proper sequence.According to this method: we have, 0A)12A using hex computation: 0A)1D – 8↑ 2 – 9 | Here we are using the hexadecimal calculation to separate the decimal digits from the given number and then we can assemble the digits properly. In our example as shown by the division above, we see the decimal equivalent of 12A as 2-9-8 digit wise, which, assembled, gives the decimal number 298. The calculations done here are all in hexadecimal to get the digits and then it is only a question of assembling the digits properly. We, being conversant with decimal calculations, will find the first method (method with calculations in decimal) more convenient, but in computers it is always the method using hexadecimal computations, that is, the second method, here, of separating the digits by hexadecimal division which is simpler to use. If we want to convert BCD to hex, we would find decimal division successively by 16, to separate the digits to be more convenient, while in the computer, multiplying the decimal digits by powers 0A in the hexadecimal system and adding the hex results in the hex base would be convenient. While going from an arbitrary base to another arbitrary base, we may find it convenient to go via decimal system using decimal computations, and in the computers it will be convenient to go through hex system using hex calculations.

Exercise: Convert the number AB5 in base 13 to its equivalent in base 12, as decimal- system-using people would do it, and also as hex-system-using computer would do it. [hint: decimal: AB5 in base 13 = (10*13 + 11)*13 + 5 = 1838 decimal = 12)1838Hence the result: AB5 in base 13 = 1092 in base 12 12)153 – 2↑ 12)12 – 9 | 1 – 0 | In hexadecimal computation: AB5 = (A*D + B)*D + 5 = 72E hex = C)72EHence the result: AB5 in base D = 1092 in base C C)99 – 2↑ C) C – 9 | 1 – 0 | ]

With this background, we shall now study different programs in different styles for the conversion of 4-digit hex to 5-digit BCD.

Programming Style 1: The first style we use is the simple minded approach to successively divide the hex number by 0A hex and assemble properly, the different decimal digits that we get. We continuously use word sized division, although some divisions could be byte size (we are simple minded in this respect). The method, involving hex operations, is ideally suited for a binary computer. We will consider the number to be originally in register AX in hexadecimal, and we want to get the output 5 digits in DX:AX; the most significant digit in DX and the rest of the digits in AX. We will use one register to store each digit, as we have adequate registers. CX, we use for storing the divisor 0A initially and different shift counts (required for positioning the

71

digits properly) later. Registers BX, SI and DI are used to store three of the digits, while the last two of the digits happen to be in DX and AX, where we would finally need them. We would need no further registers. The program and its execution are given below. The program is simple and self explanatory. The style1.asm program; 4-digit hex in ax, converted to 5-digit BCD in dx:ax; regs used bx, cx, dx, si, di, bp; algorithm used: separate digits by successively dividing by; decimal 10 (hex 0a), and aligning the digits appropriatelycodeseg segmentassume cs:codeseg start: sub dx, dx; 0 dx, prepare to do word division mov cx, 10; divisor for digit separation; initially digits are separated and stored in different registers. div cx ; word division mov bx, dx; l.s.digit bx sub dx, dx div cx mov si, dx; next m.s.digit si sub dx, dx div cx mov di, dx; next m.s.digit di sub dx, dx div cx xchg ax,dx; next m.s.digit ax and m.s.digit dx; at this point dx has the m.s.digit as required, and bx has; the l.s.digit properly positioned. Other digits will have; to be properly positioned by shifting appropriately. mov cx, 12 shl ax, cl; ax is positioned mov cx,8 shl di, cl; di is positioned mov cx, 4 shl si, cl; si is positioned; now all we require is to assemble the last 4-digits of the; result in ax. add ax, si add ax, di add ax, bx int 01 codeseg ends end startExecution of the .exe program: (i) Program unassembled-u 0 3013D6:0000 2BD2 SUB DX,DX 13D6:0002 B90A00 MOV CX,000A 13D6:0005 F7F1 DIV CX 13D6:0007 8BDA MOV BX,DX 13D6:0009 2BD2 SUB DX,DX 13D6:000B F7F1 DIV CX 13D6:000D 8BF2 MOV SI,DX 13D6:000F 2BD2 SUB DX,DX 13D6:0011 F7F1 DIV CX 13D6:0013 8BFA MOV DI,DX 13D6:0015 2BD2 SUB DX,DX 13D6:0017 F7F1 DIV CX

72

13D6:0019 92 XCHG DX,AX 13D6:001A B90C00 MOV CX,000C 13D6:001D D3E0 SHL AX,CL 13D6:001F B90800 MOV CX,0008 13D6:0022 D3E7 SHL DI,CL 13D6:0024 B90400 MOV CX,0004 13D6:0027 D3E6 SHL SI,CL 13D6:0029 03C6 ADD AX,SI 13D6:002B 03C7 ADD AX,DI 13D6:002D 03C3 ADD AX,BX 13D6:002F CD01 INT 01 (ii) Testing of the program with data FFEF hex = 65519 decimal-raxAX 0000:ffef ; initialize ax with the hex data FFEF-r ; diplay intial register contentsAX=FFEF BX=0000 CX=0031 DX=1234 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0000 NV UP EI PL NZ NA PO NC 13D6:0000 2BD2 SUB DX,DX ;execute the 1st instn. -t16 ; trace 16 hex (that is 22 decimal) instructions.AX=FFEF BX=0000 CX=0031 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0002 NV UP EI PL ZR NA PE NC 13D6:0002 B90A00 MOV CX,000A ;2nd AX=FFEF BX=0000 CX=000A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0005 NV UP EI PL ZR NA PE NC 13D6:0005 F7F1 DIV CX ;3rd AX=1997 BX=0000 CX=000A DX=0009 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0007 NV UP EI PL ZR NA PE NC 13D6:0007 8BDA MOV BX,DX ;4th AX=1997 BX=0009 CX=000A DX=0009 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0009 NV UP EI PL ZR NA PE NC 13D6:0009 2BD2 SUB DX,DX ;5th

AX=1997 BX=0009 CX=000A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=000B NV UP EI PL ZR NA PE NC 13D6:000B F7F1 DIV CX ;6th AX=028F BX=0009 CX=000A DX=0001 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=000D NV UP EI PL ZR NA PE NC 13D6:000D 8BF2 MOV SI,DX ;7th AX=028F BX=0009 CX=000A DX=0001 SP=0000 BP=0000 SI=0001 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=000F NV UP EI PL ZR NA PE NC 13D6:000F 2BD2 SUB DX,DX ;8th AX=028F BX=0009 CX=000A DX=0000 SP=0000 BP=0000 SI=0001 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0011 NV UP EI PL ZR NA PE NC 13D6:0011 F7F1 DIV CX ;9th AX=0041 BX=0009 CX=000A DX=0005 SP=0000 BP=0000 SI=0001 DI=0000 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0013 NV UP EI PL ZR NA PE NC 13D6:0013 8BFA MOV DI,DX ;10th

73

AX=0041 BX=0009 CX=000A DX=0005 SP=0000 BP=0000 SI=0001 DI=0005 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0015 NV UP EI PL ZR NA PE NC 13D6:0015 2BD2 SUB DX,DX ;11th AX=0041 BX=0009 CX=000A DX=0000 SP=0000 BP=0000 SI=0001 DI=0005 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0017 NV UP EI PL ZR NA PE NC 13D6:0017 F7F1 DIV CX ;12th AX=0006 BX=0009 CX=000A DX=0005 SP=0000 BP=0000 SI=0001 DI=0005 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0019 NV UP EI PL ZR NA PE NC 13D6:0019 92 XCHG DX,AX ;13th AX=0005 BX=0009 CX=000A DX=0006 SP=0000 BP=0000 SI=0001 DI=0005 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=001A NV UP EI PL ZR NA PE NC 13D6:001A B90C00 MOV CX,000C ;14th AX=0005 BX=0009 CX=000C DX=0006 SP=0000 BP=0000 SI=0001 DI=0005 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=001D NV UP EI PL ZR NA PE NC 13D6:001D D3E0 SHL AX,CL ;15th AX=5000 BX=0009 CX=000C DX=0006 SP=0000 BP=0000 SI=0001 DI=0005 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=001F NV UP EI PL NZ NA PE NC 13D6:001F B90800 MOV CX,0008 ;16th AX=5000 BX=0009 CX=0008 DX=0006 SP=0000 BP=0000 SI=0001 DI=0005 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0022 NV UP EI PL NZ NA PE NC 13D6:0022 D3E7 SHL DI,CL ;17th AX=5000 BX=0009 CX=0008 DX=0006 SP=0000 BP=0000 SI=0001 DI=0500 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0024 NV UP EI PL NZ NA PE NC 13D6:0024 B90400 MOV CX,0004 ;18th

AX=5000 BX=0009 CX=0004 DX=0006 SP=0000 BP=0000 SI=0001 DI=0500 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0027 NV UP EI PL NZ NA PE NC 13D6:0027 D3E6 SHL SI,CL ;19th AX=5000 BX=0009 CX=0004 DX=0006 SP=0000 BP=0000 SI=0010 DI=0500 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0029 NV UP EI PL NZ NA PO NC 13D6:0029 03C6 ADD AX,SI ;20th AX=5010 BX=0009 CX=0004 DX=0006 SP=0000 BP=0000 SI=0010 DI=0500 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=002B NV UP EI PL NZ NA PO NC 13D6:002B 03C7 ADD AX,DI ;21st AX=5510 BX=0009 CX=0004 DX=0006 SP=0000 BP=0000 SI=0010 DI=0500 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=002D NV UP EI PL NZ NA PO NC 13D6:002D 03C3 ADD AX,BX ;22nd AX=5519 BX=0009 CX=0004 DX=0006 SP=0000 BP=0000 SI=0010 DI=0500 DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=002F NV UP EI PL NZ NA PO NC 13D6:002F CD01 INT 01 ;23rd instruction not executed. ; The result in DX:AX is seen to be 6:5519, as the equivalent of FFEF hex.; The registers BX, CX, DX, SI and DI have lost their original contents.-q

Review and Comments on the style of the above Program: As already stated, the program is simple. The only attempt at economic register management is seen in

74

making a single register choice of CX register, initially for the devisor store, and later, after the digit separation, for the shift count store. For shift count store, no other register will be useful. However, for storing the divisor 0A another register, BP, could have been used, which means the demands of the program are much less compared to the register resources available. The operations performed are mindlessly repeated as many times as required without any attempts to optimize. Firstly, digit separation using word division, and then positioning the words for the final assembly. The data is handled throughout in terms of words, while byte handling at places could have simplified the operation. The style reminds me of the children’s story style, with repetitions of identical stuff many times. It is tolerable perhaps in a beginner’s program.

Programming Style 2, efficient Resource Management: In this style we use again the same algorithm and try to improve the program to overcome the deficiencies noted above. After two word divisions, we use byte divisions, and carry utmost economy in register usage by handling the digits as bytes rather than words. CX, CL register is used for divisor and for shift count. The program conceived on this basis is shown below, along with a demonstration in the debug.

Style2.asm; As before, the data to be converted is considered to be in register AX.; Consider result to be d4d3d2d1d0, where each dn is a nibble of BCD digit.; We want to have 000d4 in dl, and d3d2d1d0 in ax as the result.; Registers destroyed: BX, CX and DXcodeseg segmentassume cs:codeseg begin: mov cx,0ah ; decimal 10 sub dx,dx ; preparing for word division div cx mov bx,dx ; 000d0 bx sub dx,dx ; for word division again div cx ; 00 dh, 0d1 dl div cl ; byte division now xchg bh, ah ; 0d20d0 bx ;00 ah for next byte division div cl ; 0d30d4 ax xchg dl, al ; 000d4 dx; 0d30d1 ax mov cl, 04 shl ax,cl ; d30d10 ax add ax,bx ; d3d2d1d0 ax; dx already has 000d4 ; hence the result is ready int 01 codeseg ends end beginExecution of the program in the debug (i) Program unassembled-u 0 1b13D5:0000 B90A00 MOV CX,000A 13D5:0003 2BD2 SUB DX,DX 13D5:0005 F7F1 DIV CX 13D5:0007 8BDA MOV BX,DX 13D5:0009 2BD2 SUB DX,DX 13D5:000B F7F1 DIV CX 13D5:000D F6F1 DIV CL 13D5:000F 86FC XCHG BH,AH

75

13D5:0011 F6F1 DIV CL 13D5:0013 86D0 XCHG DL,AL 13D5:0015 B104 MOV CL,04 13D5:0017 D3E0 SHL AX,CL 13D5:0019 03C3 ADD AX,BX 13D5:001B CD01 INT 01

(ii) Program executed for the data ABCD hex = 43981 decimal-rax AX 0000:abcd-rAX=ABCD BX=0000 CX=001D DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 B90A00 MOV CX,000A -t 0d ; execute the next 0d(hex) or 13(decimal) instructionsAX=ABCD BX=0000 CX=000A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL NZ NA PO NC 13D5:0003 2BD2 SUB DX,DX AX=ABCD BX=0000 CX=000A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0005 NV UP EI PL ZR NA PE NC 13D5:0005 F7F1 DIV CX AX=112E BX=0000 CX=000A DX=0001 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL ZR NA PE NC 13D5:0007 8BDA MOV BX,DX AX=112E BX=0001 CX=000A DX=0001 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI PL ZR NA PE NC 13D5:0009 2BD2 SUB DX,DX AX=112E BX=0001 CX=000A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000B NV UP EI PL ZR NA PE NC 13D5:000B F7F1 DIV CX AX=01B7 BX=0001 CX=000A DX=0008 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000D NV UP EI PL ZR NA PE NC 13D5:000D F6F1 DIV CL AX=092B BX=0001 CX=000A DX=0008 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL ZR NA PE NC 13D5:000F 86FC XCHG BH,AH AX=002B BX=0901 CX=000A DX=0008 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI PL ZR NA PE NC 13D5:0011 F6F1 DIV CL AX=0304 BX=0901 CX=000A DX=0008 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0013 NV UP EI PL ZR NA PE NC 13D5:0013 86D0 XCHG DL,AL AX=0308 BX=0901 CX=000A DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0015 NV UP EI PL ZR NA PE NC 13D5:0015 B104 MOV CL,04 AX=0308 BX=0901 CX=0004 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000

76

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0017 NV UP EI PL ZR NA PE NC 13D5:0017 D3E0 SHL AX,CL

AX=3080 BX=0901 CX=0004 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0019 NV UP EI PL NZ NA PO NC 13D5:0019 03C3 ADD AX,BX AX=3981 BX=0901 CX=0004 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=001B NV UP EI PL NZ NA PE NC 13D5:001B CD01 INT 01 -q

Review and Comments on Style 2: In the program of Style 2, we see that even though the same algorithm is used, the resource management has been very finely tuned to the problem at hand. Perhaps in this program, it may not be possible to alter a single instruction (and of course alter the rest of the program, if necessary, to give proper result) without altering the efficiency of the program. It is something like a good piece of poetry. Good poetry (like the Elegy Written in a Country Churchyard, by Thomas Gray) they say, is such that a single word in the writing cannot be replaced by an alternative word, without somehow degrading the quality of the writing. I call this style therefore, the poetry style, and it is this style which good programmers normally try to develop. There are a large number of variations possible, in the programming domain, between the simple style 1 and the style 2, to suit the taste and capabilities of any programmer. An example of such an intermediate type of program is given below without comments and without a demonstration of its working, for the purpose of your study. As one approaches the reasonably perfect program, it becomes more and more difficult to improve on the program, till at last one comes to a point where one thinks further improvement is not worth the trouble. That will be the style 2 program. Below I give a program which is only partially optimized, with a style between the styles 1 and 2. The program is given without comments for your study.

A Program with a style between Style 1 and Style 2 for hex-to-BCD conversion. ; This program uses regs. BX, CX, DX and SIcodehere segmentassume cs: codehere strt: mov si, 0ah mov cx, 4 sub dx,dx div si mov bx,dx mov dl,dh div si shl dx,cl or bx,dx mov dl,dh div si mov bh,dl mov dl,dh div si xchg ax,dx ror ax,cl or ax,bx int 01 codehere ends

77

end strt

Programming Style 3, extracting Full Power from the Instruction Set: This is a very complex style of programming wherein one tries to exploit as much as possible, the raw power of the processor instructions and capabilities. Properly exploited, this method would provide the best possible program for a given job. May be, this requires a little thinking in what is sometimes called ‘out of the box’ fashion. It is not worth wasting time on this, as it is a sort of creative type of activity, where there is no guarantee of a solution. If you get it, you get it, else, you don’t; so leave it at that. In our chosen example, we still follow the same method of digit separation and positioning, but we do it in a slightly more efficient fashion. The following is the program:

Style3.asm; Input is assumed in AX as before, and the equivalent 5-digit BCD in DX:AX.; Registers used BX, CX and DX.; Instead of dividing by 10 four times (twice word, and twice byte in style 2),; here we divide by 100 twice, (once word div and one more byte div) and the 100s; are converted to 10s and units digits using the instruction AAM, and this is ; an extended or unusual use of the instruction. It should be noted that while ; handling AAM on a two digit hex in AL, it is not necessary to have the ; register AH cleared, AAM automatically loads AH with the upper BCD digit.code_here segmentassume cs:code_here star: mov cx, 100; 100 decimal = 64 hex sub dx, dx ; prepare for word division div cx ; 00 DH and hex eq. of d1d0 BCD DL as hex div cl ; 0d4 AL and hex eq. of d3d2 BCD AH xchg dl,al ; 000d4 DX and hex of d1d0 AL; (hex of d3d2 BCD AH) mov bl, ah ; hex of d3d2 BL aam ; 0d10d0 AH xchg bx,ax ; 0d10d0 BX; hex of d3d2 AL aam ; 0d30d2 AX xchg al,bh ; 0d30d1 AX; 0d20d0 BX rol ax, cl ; watch this! (CL=64; but effective rotation is only 4). ; d30d10 AX add ax, bx ; d3d2d1d0 AX, DX already has 000d4; all set to finish int 01 ; finish code_here ends end star Program executed in debug: (i) program unassembled-u 0 1813D5:0000 B96400 MOV CX,0064 13D5:0003 2BD2 SUB DX,DX 13D5:0005 F7F1 DIV CX 13D5:0007 F6F1 DIV CL 13D5:0009 86D0 XCHG DL,AL 13D5:000B 8ADC MOV BL,AH 13D5:000D D40A AAM 13D5:000F 93 XCHG BX,AX 13D5:0010 D40A AAM 13D5:0012 86C7 XCHG AL,BH 13D5:0014 D3C0 ROL AX,CL 13D5:0016 03C3 ADD AX,BX

78

13D5:0018 CD01 INT 01 (ii) Program tested with data AFBE hex = 44990 decimal

-raxAX 0000:afbe-rAX=AFBE BX=0000 CX=001A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 B96400 MOV CX,0064 -t c ; execute 12 (0C hex) instructions

AX=AFBE BX=0000 CX=0064 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL NZ NA PO NC 13D5:0003 2BD2 SUB DX,DX AX=AFBE BX=0000 CX=0064 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0005 NV UP EI PL ZR NA PE NC 13D5:0005 F7F1 DIV CX AX=01C1 BX=0000 CX=0064 DX=005A SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL ZR NA PE NC 13D5:0007 F6F1 DIV CL; 5A hex = 90 decimal AX=3104 BX=0000 CX=0064 DX=005A SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI PL ZR NA PE NC 13D5:0009 86D0 XCHG DL,AL ; 31 hex = 49 decimal AX=315A BX=0000 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000B NV UP EI PL ZR NA PE NC 13D5:000B 8ADC MOV BL,AH

AX=315A BX=0031 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000D NV UP EI PL ZR NA PE NC 13D5:000D D40A AAM ; note AH is over-written by this instruction.

AX=0900 BX=0031 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL ZR NA PE NC 13D5:000F 93 XCHG BX,AX; 5A hex = 90 decimal AX=0031 BX=0900 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0010 NV UP EI PL ZR NA PE NC 13D5:0010 D40A AAM AX=0409 BX=0900 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ NA PE NC 13D5:0012 86C7 XCHG AL,BH; 31 hex = 49 decimal AX=0409 BX=0900 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0014 NV UP EI PL NZ NA PE NC 13D5:0014 D3C0 ROL AX,CL AX=4090 BX=0900 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0016 NV UP EI PL NZ NA PE NC 13D5:0016 03C3 ADD AX,BX

79

AX=4990 BX=0900 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0018 NV UP EI PL NZ NA PE NC 13D5:0018 CD01 INT 01 -q

Review and Comments on Style 3: This style 3 may not always be available as pointed out already. But when available and properly applied, it can give the best results by way of giving a very efficient program. It may be a little difficult to comprehend also, and may require extensive commenting in the assembly program at every instruction used. I would call this a power method or a creative method, requiring an extensive and thorough knowledge of the instruction set. Normal programmers will do well not to bother too much about this type of programming. This is also difficult to maintain and modify if required.

Programming Style 4: This style makes use of an algorithm which is not suitable for the processor on hand and is presented here as a style to be avoided. In terms of literary activities, this style would correspond to using a method of presentation which is not befitting the theme presented, like trying to write a big novel on a material suitable only for a short story. Only consummate artists may perhaps do it effectively successfully.

Many processors may not be fully geared to handle certain specific types of jobs. The Intel 8086 for example, is not very efficient for handling computations in decimal. If we make the 4-digit hex to 5-digit BCD conversion by decimal computations in this processor, we will have to be using very circuitous methods. All that 8086 can do in respect of multi digit decimal handling is to handle 2-digit decimal addition/ subtraction. The hex-to-BCD conversion can still be done and here is a way of doing it. But I repeat, the method becomes quite complex and wasteful of resources and is recommended to be avoided.

In the program given, the most significant digit is still computed using subtraction for simplicity, and the remaining 4 digits, whose hex value can at most be 270F (=9999 decimal) are found by decimal computation (see discussion at the beginning of this chapter). The method consists of finding the place value of each bit in decimal, and adding it to the result number as a decimal number, if the corresponding bit is present in the hex number to be converted. To give a small example, if we want to calculate the decimal value of 10111 binary, we calculate the weights of each bit in decimal, b4 = 16 decimal, b3 = 8 decimal, b2= 4 decimal, b1 = 2 decimal and b0 = 1 decimal. In the given number, b3 is absent, so the decimal value of the number is (16+4+2+1); addition to be done in the decimal system, and it works out to 23 decimal. The program is given below, with a test demo. Remember, we are not even using Horner’s rule here.

Program Style4.asmco segmentassume cs:co star: mov si, -1; in si we want to get the digit d4 using successive

; subtraction of 10000 (dec) from the given number

80

mov bx, 10000 back: inc si sub ax, bx jnc back add ax, bx; the m s digit d4 is now in SI; remaining 14-bit no. in ax mov cx, 14; loop count mov di, ax; the remaining 14-bit number to DI for bit checking sub bx, bx; bx is where we are adding the decimal numbers ; which will give us the final result (along with SI) mov dx, 1 ; in dx we have the weight of the current bit in decimal. jmp down loopst: mov al, dl add al,al daa mov dl, al mov al, dh adc al,al daa mov dh, al ; decimal doubling of DX contents down: shr di,1 ; jnc loopend mov al,bl add al,dl daa mov bl, al mov al, bh adc al, dh daa mov bh, al ; decimal adding of DX to BX loopend: or di, di ; check for any data in di loopnz loopst; loop termination if di = 0, or count = 14. mov ax, bx mov dx, si int 01 co ends end star-u 0 4213D5:0000 BEFFFF MOV SI,FFFF 13D5:0003 BB1027 MOV BX,2710 13D5:0006 46 INC SI 13D5:0007 2BC3 SUB AX,BX 13D5:0009 73FB JNB 0006 13D5:000B 03C3 ADD AX,BX 13D5:000D B90E00 MOV CX,000E 13D5:0010 8BF8 MOV DI,AX 13D5:0012 2BDB SUB BX,BX 13D5:0014 BA0100 MOV DX,0001 13D5:0017 EB0F JMP 0028 13D5:0019 90 NOP ; not in the .asm; inserted by the assembler 13D5:001A 8AC2 MOV AL,DL 13D5:001C 02C0 ADD AL,AL 13D5:001E 27 DAA 13D5:001F 8AD0 MOV DL,AL 13D5:0021 8AC6 MOV AL,DH 13D5:0023 12C0 ADC AL,AL 13D5:0025 27 DAA 13D5:0026 8AF0 MOV DH,AL 13D5:0028 D1EF SHR DI,1 13D5:002A 730E JNB 003A 13D5:002C 8AC3 MOV AL,BL 13D5:002E 02C2 ADD AL,DL

81

13D5:0030 27 DAA 13D5:0031 8AD8 MOV BL,AL 13D5:0033 8AC7 MOV AL,BH 13D5:0035 12C6 ADC AL,DH 13D5:0037 27 DAA 13D5:0038 8AF8 MOV BH,AL 13D5:003A 0BFF OR DI,DI 13D5:003C E0DC LOOPNZ 001A 13D5:003E 8BC3 MOV AX,BX 13D5:0040 8BD6 MOV DX,SI 13D5:0042 CD01 INT 01 -raxAX 0000:abcd-rAX=ABCD BX=0000 CX=0044 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 BEFFFF MOV SI,FFFF -t22; trace next 34 (22h) instruction executions.AX=ABCD BX=0000 CX=0044 DX=0000 SP=0000 BP=0000 SI=FFFF DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL NZ NA PO NC 13D5:0003 BB1027 MOV BX,2710 AX=ABCD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=FFFF DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI PL NZ NA PO NC 13D5:0006 46 INC SI AX=ABCD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL ZR AC PE NC 13D5:0007 2BC3 SUB AX,BX AX=84BD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI NG NZ NA PE NC 13D5:0009 73FB JNB 0006 AX=84BD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI NG NZ NA PE NC 13D5:0006 46 INC SI AX=84BD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0001 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ NA PO NC 13D5:0007 2BC3 SUB AX,BX AX=5DAD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0001 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 OV UP EI PL NZ NA PO NC 13D5:0009 73FB JNB 0006 AX=5DAD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0001 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 OV UP EI PL NZ NA PO NC 13D5:0006 46 INC SI AX=5DAD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0002 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ NA PO NC 13D5:0007 2BC3 SUB AX,BX AX=369D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0002 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI PL NZ NA PO NC 13D5:0009 73FB JNB 0006

82

AX=369D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0002 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI PL NZ NA PO NC 13D5:0006 46 INC SI AX=369D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0003 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ NA PE NC 13D5:0007 2BC3 SUB AX,BX AX=0F8D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0003 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI PL NZ NA PE NC 13D5:0009 73FB JNB 0006 AX=0F8D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0003 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI PL NZ NA PE NC 13D5:0006 46 INC SI AX=0F8D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0004 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ NA PO NC 13D5:0007 2BC3 SUB AX,BX AX=E87D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0004 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI NG NZ NA PE CY 13D5:0009 73FB JNB 0006 AX=E87D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0004 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000B NV UP EI NG NZ NA PE CY 13D5:000B 03C3 ADD AX,BX AX=0F8D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0004 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000D NV UP EI PL NZ NA PE CY 13D5:000D B90E00 MOV CX,000E AX=0F8D BX=2710 CX=000E DX=0000 SP=0000 BP=0000 SI=0004 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0010 NV UP EI PL NZ NA PE CY 13D5:0010 8BF8 MOV DI,AX AX=0F8D BX=2710 CX=000E DX=0000 SP=0000 BP=0000 SI=0004 DI=0F8D DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ NA PE CY 13D5:0012 2BDB SUB BX,BX AX=0F8D BX=0000 CX=000E DX=0000 SP=0000 BP=0000 SI=0004 DI=0F8D DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0014 NV UP EI PL ZR NA PE NC 13D5:0014 BA0100 MOV DX,0001 AX=0F8D BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=0F8D DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0017 NV UP EI PL ZR NA PE NC 13D5:0017 EB0F JMP 0028 AX=0F8D BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=0F8D DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0028 NV UP EI PL ZR NA PE NC 13D5:0028 D1EF SHR DI,1 AX=0F8D BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002A NV UP EI PL NZ NA PE CY 13D5:002A 730E JNB 003A AX=0F8D BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002C NV UP EI PL NZ NA PE CY 13D5:002C 8AC3 MOV AL,BL AX=0F00 BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002E NV UP EI PL NZ NA PE CY 13D5:002E 02C2 ADD AL,DL

83

AX=0F01 BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0030 NV UP EI PL NZ NA PO NC 13D5:0030 27 DAA AX=0F01 BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0031 NV UP EI PL NZ NA PO NC 13D5:0031 8AD8 MOV BL,AL AX=0F01 BX=0001 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0033 NV UP EI PL NZ NA PO NC 13D5:0033 8AC7 MOV AL,BH AX=0F00 BX=0001 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0035 NV UP EI PL NZ NA PO NC 13D5:0035 12C6 ADC AL,DH AX=0F00 BX=0001 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0037 NV UP EI PL ZR NA PE NC 13D5:0037 27 DAA AX=0F00 BX=0001 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0038 NV UP EI PL ZR NA PE NC 13D5:0038 8AF8 MOV BH,AL AX=0F00 BX=0001 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=003A NV UP EI PL ZR NA PE NC 13D5:003A 0BFF OR DI,DI AX=0F00 BX=0001 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=003C NV UP EI PL NZ NA PE NC 13D5:003C E0DC LOOPNZ 001A ; this is a trace of one run of the loop -g 42; 14 such runs are executed (unless there is termination due to zero flag)AX=3981 BX=3981 CX=0002 DX=0004 SP=0000 BP=0000 SI=0004 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0042 NV UP EI PL ZR NA PE NC 13D5:0042 CD01 INT 01 ; here loop has terminated after 12 runs ; still 2 runs to go, as CX = 2.-q

Review and Comments on Style 4: The method used in this program involves digit separation and summation of the place values of the binary digits in decimal terms. We are using 4-digit BCD summation, while the processor provides only two digit summation and that too using two instructions. One of the data is to be in AL for the 2-digit decimal add to be successful. A good part of the program, practically the entire loop in the above program, is devoted to this double byte decimal addition processes. This is the price paid for using algorithms, with not well supported computational techniques in a processor system. In this case the price paid is in terms of the time of execution and memory space required for the program, along with increased usage of register resources. Notice the loop has a lot of instructions and all these instructions are executed 14 times normally, which reflects heavily on the time of execution of the program. This program could be simplified using Horner’s Rule for polynomial evaluation as shown in the ALP below. The program is not explained, nor commented. These are left to the reader as exercises.

84

co segmentassume cs:co sr: mov cx,10000 mov dx,-1 back: inc dx sub ax,cx jnc back add ax,cx shl ax,1 shl ax,1 mov bx,ax mov cx,14 sub ax,ax star: shl bx,1 adc al,al daa xchg al,ah adc al,al daa xchg al,ah loop star int 1 co ends end sr

Programming Style 5, add your own Instructions to the Instruction Set: While programming you may sometimes feel, “Ah! If only there is an instruction that would do this, and this, oh! It would be wonderful!” No need to despair. You can cook up your own instructions and use them. When in a particular problem, there is a small set of operations to be done again and again, it would be helpful to bundle up all these operations by having a short name for the operation bundle, and invoke this as many times as you want. To illustrate this, we go back to our Programming style 1, where we had a sort of mindless repetition of word division, shifting and adding. We can combine all these, put them in a sort of macro instruction bundle and keep using the macro as many times as we want, without having to get bored of writing similar sequence of instructions several times, beset with possibilities of committing mistakes while we repeat the sequence. It further, makes the program a lot easier to follow. The program will also look more elegant. Let us go to the details. Macros are more seriously covered in the next chapter.

Style5.asm; The problem is still the same, namely converting 4-digit hex to 5-digit BCD; The hex number to be converted is in register AX, result in DX:AX; BX, CX and SI are the other registers used by the program.code segmentassume cs:code ; start: ; first we define the macro bundle 'sep' with shift parameter 'num' sep macro num ; macro with name on the left and parameter(s) on ; the right sub dx, dx ; Clear dx preparatory to word division div si ; si must have decimal 10 when the macro is used mov cl,num shl dx, cl add bx, dx ; bx is where the decimal number is built up endm ; end of macro bundle.

85

; macro over, we will now use it in the program start: mov si,0ah ; divisor 10 sub bx, bx ; initialize bx to zero. sep 00 ; macro used with the shift count 00 to be loaded in CL sep 04 ; shift count 4 for the next digit and so on. sep 08 sep 12 mov dx, ax ; the m.s.digit to dx. mov ax, bx ; move the assembled 4 digits to ax. int 01 ; terminate code ends end start; Apart from the macro definition and the macro use, there are only four ; other instructions in the program. ; The assembled program is shown below from the debug. See the expanded macros!-u 0 3113D5:0000 BE0A00 MOV SI,000A 13D5:0003 2BDB SUB BX,BX 13D5:0005 2BD2 SUB DX,DX 13D5:0007 F7F6 DIV SI ; 13D5:0009 B100 MOV CL,00 ;macro sep 00 13D5:000B D3E2 SHL DX,CL 13D5:000D 03DA ADD BX,DX 13D5:000F 2BD2 SUB DX,DX 13D5:0011 F7F6 DIV SI ; 13D5:0013 B104 MOV CL,04 ; macro sep 04 13D5:0015 D3E2 SHL DX,CL 13D5:0017 03DA ADD BX,DX 13D5:0019 2BD2 SUB DX,DX 13D5:001B F7F6 DIV SI ; 13D5:001D B108 MOV CL,08 ; macro sep 08 13D5:001F D3E2 SHL DX,CL 13D5:0021 03DA ADD BX,DX 13D5:0023 2BD2 SUB DX,DX 13D5:0025 F7F6 DIV SI ; 13D5:0027 B10C MOV CL,0C ; macro sep 12 13D5:0029 D3E2 SHL DX,CL 13D5:002B 03DA ADD BX,DX 13D5:002D 8BD0 MOV DX,AX 13D5:002F 8BC3 MOV AX,BX 13D5:0031 CD01 INT 01 -q; The working of this program is not shown; may be studied as an exercise.

Review and Comments on Style 5: If style 1 made the program something like a children’s story, the style 5 makes it even simpler; it reduces the program almost to a child’s play! If the problem permits by having operations that repeat several times, this style is very simple in respect of visualizing the operations and writing the program. However time and memory optimization may not be there to the extent possible. In the interest of repetition here, we have to use word division throughout and byte operations cannot be used. The resulting time and memory economy will not be there. Further, in the above program, the first use of the macro does unnecessary shift and add operations. The whole macro could be replaced for the first time by: SUB DX, DX

DIV SI MOV BX, DX

86

If that was done, the 2nd instruction of our program, namely SUB BX, BX will become superfluous and could be omitted without harm. As a basis to be improved towards style 2, this sort of program is easy to write, and the assembled program gives us a starting style 1 program for being taken towards style 2.

In the foregoing, we have seen a few programming styles. Style 3 is the best from memory and time efficiency points of view, but is difficult for those with an average ability in the use of the instruction set, to venture into. Styles 1 or 5 may form the basis of our starting framework, to be kept at the back of our mind, and on the fly, using the ideas of these styles, we could attempt actual program writing in style 2, which is perhaps the normal assembly language programmer’s goal. In the initial analysis we will have to find the algorithm that best suits the given problem and the processor system we have, so as to avoid programming in style 4. Style 3 may be left out actually, as the gain from this will be marginal, and it will not be worthwhile considering the effort to be put in, as well as the depth and the breadth of the system knowledge required for the purpose. The program written in this style is not good from maintainability point of view either. Any modification or alteration to the program will be quite difficult. In this type of program, a change at one point may produce unanticipated side effects elsewhere in the program which may turn out to be very hard to catch and correct. Optimality and maintainability are conflicting requirements many a time, and assembly language programmers are advised not to use style 3 programs, but to settle for style 2 or style 5, with moderate optimization and with adequate comments indicating the logic of the processes. We don’t have experts available all the time to handle any modifications or alterations to the program when required. The programs must be understandable, not only to the original programmer, but also to any programmer with average expertise, at any time, in order for it to be maintainable.

Although style 3 programs are not to be used for commercial purposes, from the point of learning the art of programming, developing expertise and for getting a deeper knowledge of the instruction set and the processor, they are perhaps the best.

Exercises:1. Using the ideas presented in this chapter, write a program to convert 4-digit BCD

number in the AX register to 4-digit hex also output in the AX register.2. Study the 6-digit hex to 8-digit BCD conversion program bincvt given last in Fig

10.35 of the Microprocessor book by Douglas Hall (2nd edition or 2nd revised addition, TMH Publications) and identify the programming style used. Could you think of a suitable style 1or style 2 programs in this context?

3. Given below is a Style 3 program, without comments, for converting 4-digit BCD in AX, to its equivalent hex. The out put is in register AX itself. The program uses BX and CX registers. Figure out the logic of the program and fill in the comments.

co segmentassume cs:co strt: mov bx,ax and ax,0f0f0H

87

mov cl,2 shr ax,cl sub bx,ax shr ax,1 sub bx,ax mov ah,bh sub al,al shr ax,1 sub bx,ax shr ax,cl sub bx,ax inc cl shr ax,cl add ax,bx int 01 co ends end strt

4. Here is another optimized program for doing the 4-digit BCD to 4-digit hex conversion, also given without comments. Test the program and reason out how it works. The program enters with the BCD number in AX (uses just the two registers CX and DX) and returns the hex result also in AX.

code segment assume cs: code strt: mov dx, ax mov cx, 0a04h and ax, 0f0f0h sub dx, ax rol ax, cl mov cl, dl mov dl, ah mul ch add al, dh mov dh, ah mul ch add ax, dx mov dl, ch mul dx mov ch, dh add ax, cx int 1 code ends end strt

Compare the two programs given in the exercises 3 and 4 above. Both are perhaps style 3 programs, however. Determine which one is the worst of the two. You may note that the program of exercise 4 uses Horner’s rule. Observe the way in which the registers are managed and the whole process is optimized in this program.


88

4. MACROS AND SUBROUTINES

Macros and Subroutines normally appear to be doing similar type of jobs, namely, avoiding writing the same string of instructions several times in a program. However, there are quite a lot of differences between the two. We shall be looking into these differences and then learning about the proper use of Macros and Subroutines (or Procedures) in this chapter.

Features of a Macro: We have already been introduced to macros in the previous chapter. There, we have described macros as a sort of user defined sequence of instructions, in which the operands could be varied as per the parameters of the macro. Well. They have certain additional capabilities as well, like handling loops, conditional operations etc, where labels are to be used, and these labels have to be localized for the particular invocation of the macro, and should not be repeated when the macro is invoked again. This aspect is explained below:

1. Macros can support local labels for instructions: If we want to have a loop or a conditional jump we need to provide a label for the loop start or the conditional jump destination. Let us consider we want a loop to be handled in a macro, and let us say we have labeled the loop start instruction as lpst. As we know when we invoke the macro, the entire sequence of instructions with the label and everything will be inserted at the point of invocation, only the parameters of the macro will be substituted by the parameters supplied at the invocation. If the macro with the label is invoked, the label will be appearing as such in the sequence of the macro instructions. If the invocation of the macro is done only once in the program it will work fine. But if invoked more than once (which is why we bundle it as a macro), the label for the loop inside the macro will carry the same name lpst at the start of the loop in every instance of the macro. In the program this will cause a lpst label to come up once for every invocation of the macro. This program will obviously not work. In order to overcome this problem, such labels of the macro will have to be defined as local to the macro, right at the beginning in the macro. The following program will illustrate the use of local variables, for a conditional jump operation:

THE .ASM PROGRAMcode segmentassume cs:code ddd macro rg,n local lbl mov rg,n or rg,rg jnz lbl inc rg lbl: endm strt: ddd ax,0 ddd bx,4 int 01 code ends

89

end strtTHE.LST FILE OBTAINED FROM THE ASSEMBLER

Microsoft (R) Macro Assembler Version 5.10 2/14/7 Page 1-1

0000 code segmentassume cs:code ddd macro rg,n local lbl mov rg,n or rg,rg jnz lbl inc rg lbl: endm

0000 strt: ddd ax,0 ; first invocation of macro ; followed by expansion by the ; assemler 0000 B8 0000 1 mov ax,0 0003 0B C0 1 or ax,ax 0005 75 01 1 jnz ??0000 ; first value of ‘lbl’ 0007 40 1 inc ax 0008 1 ??0000: ;see how labels are localized.

ddd bx,4 ; second invocation & expansion 0008 BB 0004 1 mov bx,4 000B 0B DB 1 or bx,bx 000D 75 01 1 jnz ??0001 ; second value of the same

; label. 000F 43 1 inc bx 0010 1 ??0001: 0010 CD 01 int 01 0012 code ends

end strtMicrosoft (R) Macro Assembler Version 5.10 2/14/7 Symbols-1

Macros:N a m e Lines

DDD . . . . . . . . . . . . . . 5Segments and Groups: N a m e Length Align Combine ClassCODE . . . . . . . . . . . . . . 0012 PARA NONESymbols: N a m e Type Value AttrSTRT . . . . . . . . . . . . . . L NEAR 0000 CODE??0000 . . . . . . . . . . . . . L NEAR 0008 CODE??0001 . . . . . . . . . . . . . L NEAR 0010 CODE@CPU . . . . . . . . . . . . . . TEXT 0101h@FILENAME . . . . . . . . . . . TEXT hb@VERSION . . . . . . . . . . . . TEXT 510

90


DEBUGGING THE PROGRAM-u 0 1013D5:0000 B80000 MOV AX,0000 13D5:0003 0BC0 OR AX,AX 13D5:0005 7501 JNZ 0008 13D5:0007 40 INC AX 13D5:0008 BB0400 MOV BX,0004 13D5:000B 0BDB OR BX,BX 13D5:000D 7501 JNZ 0010 ;see the different labels produced. 13D5:000F 43 INC BX 13D5:0010 CD01 INT 01 -rAX=0000 BX=0000 CX=0012 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 B80000 MOV AX,0000 -g 10

AX=0001 BX=0004 CX=0012 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0010 NV UP EI PL NZ NA PO NC 13D5:0010 CD01 INT 01

2. Defining the macros at the beginning: Macros can be defined any where in the assembly language program before they are invoked. Most of the programmers, however, prefer defining the macros right at the beginning of the program. It will be convenient for understanding and debugging the program also, as the entire set of macros can be seen at one single place. In contrast, subroutines are anyway separate from the main program, and they can appear anywhere in the program module.In the program seen above, we had the macro defined in the beginning itself. It was necessary there, as the first instruction in the program was invoking the macro. But even if it is not so, it is a good practice to define the macro in the beginning. The following program is an example.

; an example (only for illustration purposes, otherwise this is replacing only a ; single instruction) of defining the macro at the start of the code segment and; of using segment override for the parameter of Macroco segmentassume cs:co mox macro reg, n mov reg,n ; macro defined here, seen only by the assembler endm begin: mov bx,20 ; the program starts from here with a normal instruction

91

mox ax, cs:[bx] ; note how the segment override is used mox cl, 04 ; note how it is applicable to 8-bit regs also. int 01co endsend begin Note: direct instruction ‘mov ax, cs:[bx]’ will not be valid, and ‘cs: mox ax, [bx]’ will also be not valid.The assembled program seen in the debug13D5:0000 BB1400 MOV BX,0014 ; 13D5:0003 2E CS: ; 13D5:0004 8B07 MOV AX,[BX] ; 13D5:0006 B104 MOV CL,04 ; 13D5:0008 CD01 INT 01

3. Macros do not require maintaining the stack balance: Procedures need

to maintain stack balance, that is, at the time of exit, the return address of the subroutine which is stored at the stack top on entry to the subroutine, must still be available at the stack top when the return instruction is to be executed. This means that whatever is pushed onto the stack in the subroutine must be popped and cleared, and nothing further is to be popped before the return instruction is executed. If this condition is not satisfied, the proper return address will not be available at the stack top and the program will behave in an unpredictable fashion. Such a requirement is not there with the macros, as stack is not used at all in managing the macros. The following is a macro to illustrate this:

; the macro below just pushes 3 registers onto the stack

Pushreg macro r1, r2, r3 Push r1

Push r2 Push r3

endmSuch macros will be useful at the beginning of subroutines for saving three registers in the stack at one stroke. A similar ‘popreg’ macro at the end can recover the pushed registers. The Popreg macro which will undo the above pushes to be placed at the end of the subroutine can be:

Popreg macro r1, r2, r3 Pop r3

Pop r2Pop r1

Endm

Note the reverse order of registers in the pop operation so that the register sequence could become identical as parameters in both the pushreg and popreg macros. Also note this type of operation cannot be got done using subroutines because of stack unbalance.

4. The parameters of the macro are more flexible than those of the subroutines: In the assembly language, the parameters of the subroutine are passed using registers or through the stack. This makes the parameters to be of a fixed size. The option of using either word or byte

92

size parameters is not normally available for subroutines, whereas, in respect of macros any size that makes sense in an instruction is valid. In the program given under Para 2 above, we see the invocation of the macro mox at two places. At the first instance, the reg parameter is the register AX, and the n parameter is the CS segment over ridden indirect addressing through register BX, that is the data in the memory at address CS:BX. In the next case of the invocation, the parameter reg is the 8-bit register CL, while the parameter n is just the simple number 04. Such wide flexibility is unthinkable in procedures. In section 7 of this chapter and also in chapter 6 we will see processor opcodes can also be used as parameters of the macro, which will mean a single macro can execute different operations depending on the opcode parameter used at its invocation.

5. Macros are expanded in the executable programs, while subroutines are executed by a returnable jump: When a macro is invoked, the sequence of instructions making up the macro is directly inserted at the place of invocation, with the parameters properly substituted. This implies two things. Firstly, the macros increase the size of the executable machine language program every time they are invoked as compared to procedures, and secondly the program executes faster than with a procedure doing the same job. The overhead of storing the return address and taking an initial jump to the subroutine and a final jump back to return to the stored address is not there in macros.

6. Macros exist only in the ALP and not at the Machine Language level: Having said all the 5 points above, we have to note a fundamental difference between macros and subroutines. Macros exist only at the assembly language level, while the subroutines are seen at the machine language level also. This means there are hardware provisions for handling the subroutines by way of storing the return address in the stack, while at the machine language level there are no macros visible. Macros are only short cuts at the assembly language level and are handled by the assembler software, but do not appear as separate entities in the executable machine language programs.

Normally, for small operations, it is common practice to write macros, while large and complex operations repeated several times are handled through subroutines, as a result of the point 6 indicated above. An example of a good macro for improving the DIV instruction is given below:

Macro Smart-div: The divide instruction is rather restrictive as shown below. Divide instruction takes a double size dividend (double word for word division, or double byte for byte division) and a single size devisor to produce a single size quotient and a single size remainder. It is not always possible to have the quotient limited to single size, when dividing a double size dividend by a single size divisor. Whenever the quotient size exceeds, the processor does not carry out the division, but simply gives an indication of the divide overflow by producing an internal interrupt in the processor. This can be

93

taken care of by the programmer like we did in case of style 2, and elsewhere in Chapter 3. But that may not always be possible. Sometimes exact idea of the quotient size may not be known beforehand. In such cases to ensure that the program does not get caught at this point, it is possible to think of a macro which will carry out the division properly, producing a double size quotient, instead of single size. The macro given here, follows almost the same register allocation for division inputs, that is DS:AX for the dividend of word division, or only AX for the dividend of the byte division. The same registers carry the quotient after the division. The divisor is specified as a parameter for the macro. An additional single size parameter is provided for the remainder and is specified in the macro. The macro is defined below with examples of its use.

THE PROGRAMcode segmentassume cs:code smdiv macro d1,d2, dv, rr;; d1d2:double size dividend, dv : divisor, ;; rr: remainder local down sub rr,rr ;; clear remainder register cmp d1,dv jb down ;; if below, only one div is enough, so go down xchg rr, d1 ;; else, save d1 in rr and load 00 in d1 xchg rr, d2 ;; net result of the 2 instns:0 d1, d1 d2, d2 rr div dv ;; first division, rem d1, quot. d2 xchg rr, d2 ;; m.s.quotient rr, l.s. part of dividend d2 down: div dv xchg rr, d1 endm ; examples of smart divide macro use start: mov dx, 0abch mov ax, 1234h mov bx, 45abh smdiv dx,ax,bx,cx mov bx, dabch mov ax, 2345h mov cl, 35 smdiv ah,al, cl, bl int 01 code ends end start DEBUG OPERATIONS -u 0 3313D5:0000 BABC0A MOV DX,0ABC 13D5:0003 B83412 MOV AX,1234 13D5:0006 BBAB45 MOV BX,45AB 13D5:0009 2BC9 SUB CX,CX 13D5:000B 3BD3 CMP DX,BX 13D5:000D 7206 JB 0015 13D5:000F 87CA XCHG CX,DX 13D5:0011 91 XCHG CX,AX ; 1st expansion of smdiv 13D5:0012 F7F3 DIV BX 13D5:0014 91 XCHG CX,AX 13D5:0015 F7F3 DIV BX 13D5:0017 87CA XCHG CX,DX 13D5:0019 BB0001 MOV BX,DABC; this data will be destroyed by smdiv 13D5:001C B84523 MOV AX,2345 13D5:001F B123 MOV CL,23 13D5:0021 2ADB SUB BL,BL

94

13D5:0023 3AE1 CMP AH,CL 13D5:0025 7208 JB 002F 13D5:0027 86DC XCHG BL,AH 13D5:0029 86D8 XCHG BL,AL ; 2nd expansion of smdiv 13D5:002B F6F1 DIV CL 13D5:002D 86D8 XCHG BL,AL 13D5:002F F6F1 DIV CL 13D5:0031 86DC XCHG BL,AH 13D5:0033 CD01 INT 01 -rAX=0000 BX=0000 CX=0035 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 BABC0A MOV DX,0ABC -t 14

AX=0000 BX=0000 CX=0035 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL NZ NA PO NC 13D5:0003 B83412 MOV AX,1234 AX=1234 BX=0000 CX=0035 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI PL NZ NA PO NC 13D5:0006 BBAB45 MOV BX,45AB AX=1234 BX=45AB CX=0035 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI PL NZ NA PO NC

13D5:0009 2BC9 SUB CX,CX ; 1st smdiv AX=1234 BX=45AB CX=0000 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000B NV UP EI PL ZR NA PE NC 13D5:000B 3BD3 CMP DX,BX AX=1234 BX=45AB CX=0000 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000D NV UP EI NG NZ NA PE CY 13D5:000D 7206 JB 0015 AX=1234 BX=45AB CX=0000 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0015 NV UP EI NG NZ NA PE CY 13D5:0015 F7F3 DIV BX AX=2771 BX=45AB CX=0000 DX=44B9 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0017 NV UP EI NG NZ NA PE CY 13D5:0017 87CA XCHG CX,DX AX=2771 BX=45AB CX=44B9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0019 NV UP EI NG NZ NA PE CY 13D5:0019 BB0001 MOV BX,DABC; result: quotient 00002771 hex, remainder 44B9 hex. AX=2771 BX=DABC CX=44B9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=001C NV UP EI NG NZ NA PE CY 13D5:001C B84523 MOV AX,2345 AX=2345 BX=DABC CX=44B9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=001F NV UP EI NG NZ NA PE CY 13D5:001F B123 MOV CL,23 AX=2345 BX=DABC CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0021 NV UP EI NG NZ NA PE CY 13D5:0021 2ADB SUB BL,BL ; 2nd smdiv

95

AX=2345 BX=DA00 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0023 NV UP EI PL ZR NA PE NC 13D5:0023 3AE1 CMP AH,CL AX=2345 BX=DA00 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0025 NV UP EI PL ZR NA PE NC 13D5:0025 7208 JB 002F

AX=2345 BX=DA00 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0027 NV UP EI PL ZR NA PE NC 13D5:0027 86DC XCHG BL,AH AX=0045 BX=DA23 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0029 NV UP EI PL ZR NA PE NC 13D5:0029 86D8 XCHG BL,AL AX=0023 BX=DA45 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002B NV UP EI PL ZR NA PE NC 13D5:002B F6F1 DIV CL AX=0001 BX=DA45 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002D NV UP EI PL ZR NA PE NC 13D5:002D 86D8 XCHG BL,AL AX=0045 BX=DA01 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002F NV UP EI PL ZR NA PE NC 13D5:002F F6F1 DIV CL

AX=2201 BX=DA01 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0031 NV UP EI PL ZR NA PE NC 13D5:0031 86DC XCHG BL,AH AX=0101 BX=DA22 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0033 NV UP EI PL ZR NA PE NC 13D5:0033 CD01 INT 01; result: quotient 0101 hex, remainder 22 hex. -q

Exercise: Multiplication of two byte size data need not always produce a word size result. For example multiplication of the byte 32 hex with the byte 05 hex produces a byte size result, 0FA hex. Write a macro which will return the product, if single byte, only in register AL, without altering the register AH, and indicate the fact by returning a cleared carry flag. In case the result is word size, the register AH will be altered to give the complete word result and the fact is indicated by the carry flag. (The carry flag will anyway indicate whether the result is a byte or a word. The idea of this macro is to save AH register if it is not used by the result.) The macro may not really be very useful. It is only given as an academic exercise. Solution to the problem is given below:

THE PROGRAM code segment

assume cs: code smmul macro mr ; mr is the multiplier register local lbl push cx push ax mul mr pop cx ; note the manipulations in this and the next 2 instrns. jc lbl ; carry is set by mul instn, if the result is a full word.

96

mov ah,ch lbl: pop cx endm start: mov ax, 5632h mov cl, 05 smmul cl smmul cl int 01 code ends end startDEBUGGING

-u 0 1913D5:0000 B83256 MOV AX,5632 13D5:0003 B105 MOV CL,05 13D5:0005 51 PUSH CX 13D5:0006 50 PUSH AX 13D5:0007 F6E1 MUL CL 13D5:0009 59 POP CX 13D5:000A 7202 JB 000E 13D5:000C 8AE5 MOV AH,CH 13D5:000E 59 POP CX 13D5:000F 51 PUSH CX 13D5:0010 50 PUSH AX 13D5:0011 F6E1 MUL CL 13D5:0013 59 POP CX 13D5:0014 7202 JB 0018 13D5:0016 8AE5 MOV AH,CH 13D5:0018 59 POP CX 13D5:0019 CD01 INT 01 -rAX=0000 BX=0000 CX=001B DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 B83256 MOV AX,5632 -t2

AX=5632 BX=0000 CX=001B DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL NZ NA PO NC 13D5:0003 B105 MOV CL,05 AX=5632 BX=0000 CX=0005 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0005 NV UP EI PL NZ NA PO NC 13D5:0005 51 PUSH CX -g fAX=56FA BX=0000 CX=0005 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL NZ NA PO NC 13D5:000F 51 PUSH CX ;32 * 05 = FA. This is in AL; AH returned unchanged; carry is clear. -g 19AX=04E2 BX=0000 CX=0005 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0019 OV UP EI PL NZ NA PO CY 13D5:0019 CD01 INT 01; FA * 05 = 4E2; this is in AH:AX; AH is occupied by the result; carry set. -q

97

A macro to take in 4 BCD digits input from the keyboard, using the DOS interrupt 21h, function1: Many times we need to take a 4 digit BCD number from the key board. Here is a simple macro to do the job. It will take jus 4 BCD digits from the key board ignoring the non- BCD keys. It can be improved by making the “$” key as the terminating key from the key board, so that less than4-digit numbers can be had, and also if we go wrong in making the entry, we can reenter all 4 keys over again to get the correct number.

bcd4 macro reg ; register in which the number is to be returned.

local again xor bx, bx mov cx, 0404h mov ah, 1again: int 21h cmp al, 30h jb again cmp al, 39h ja again sub al, 30h shl bx, cl add bl, al dec ch jnz again mov reg, bx endm

7. Power of the macros to realize variable operations: The following example illustrates how a single macro can achieve either addition or subtraction of two items of data with large number of words. The trick is in using the opcode also as a parameter for the macro. The following example illustrates the principle of the macro use in this fashion. The program takes 2 large numbers of identical word lengths and does either addition of the two multi word data or subtraction of the data and stores the result. The result of the operation is stored at a third place. If the two words added are of n words in size, the result space must be n+1 words in size.

The .ASM programdata segment dat dw 234h, 5678h, 89abh, 7604h, 0abc0h, 3 dup(0) daat dw 0abcdh, 2348h, 253h, 4589h, 0fb23h, 3 dup (0) dat1 dw 8 dup(?) dat2 dw 8 dup(?) num dw 5data ends;code segment assume cs: code, ds: data, es: datamultas macro src1, src2, res, n, as local bak mov si, offset src1

mov di, offset res mov bx, offset src2 - src1-2;; see the comment on statement below.

mov cx, n ;;cld ;; clc ;;

98

bak: lodsw ;; as ax, [si + bx] ;; note the manipulations here

stoswloop bak

as cx, cx ;; note, cx=0 here mov [di], cx endm; strt: mov ax, data

mov ds, axmov es, axmultas dat, daat, dat1, num, adcmultas dat, daat, dat2, num, sbbint 1

code ends end strt

-u 0 3b13E1:0000 B8DC13 MOV AX,13DC 13E1:0003 8ED8 MOV DS,AX 13E1:0005 8EC0 MOV ES,AX ; from here only the two macros 13E1:0007 BE0000 MOV SI,0000 13E1:000A BF2000 MOV DI,0020 13E1:000D BB0E00 MOV BX,000E 13E1:0010 8B0E4000 MOV CX,[0040] 13E1:0014 FC CLD 13E1:0015 F8 CLC 13E1:0016 AD LODSW 13E1:0017 1300 ADC AX,[BX+SI] 13E1:0019 AB STOSW 13E1:001A E2FA LOOP 0016 13E1:001C 13C9 ADC CX,CX 13E1:001E 890D MOV [DI],CX 13E1:0020 BE0000 MOV SI,0000 13E1:0023 BF3000 MOV DI,0030 13E1:0026 BB0E00 MOV BX,000E 13E1:0029 8B0E4000 MOV CX,[0040] 13E1:002D FC CLD 13E1:002E F8 CLC 13E1:002F AD LODSW 13E1:0030 1B00 SBB AX,[BX+SI] 13E1:0032 AB STOSW 13E1:0033 E2FA LOOP 002F 13E1:0035 1BC9 SBB CX,CX 13E1:0037 890D MOV [DI],CX 13E1:0039 CD01 INT 01 13E1:003B 83EC08 SUB SP,+08 -gAX=B09D BX=000E CX=FFFF DX=0000 SP=0000 BP=0000 SI=000A DI=003A DS=13DC ES=13DC SS=13DC CS=13E1 IP=003B NV UP EI NG NZ AC PE CY 13E1:003B 83EC08 SUB SP,+08 -d0 4f13DC:0000 34 02 78 56 AB 89 04 76-C0 AB 00 00 00 00 00 00 4.xV...v........13DC:0010 CD AB 48 23 53 02 89 45-23 FB 00 00 00 00 00 00 ..H#S..E#.......13DC:0020 01 AE C0 79 FE 8B 8D BB-E3 A6 01 00 00 00 00 00 ...y............13DC:0030 67 56 2F 33 58 87 7B 30-9D B0 FF FF 00 00 00 00 gV/3X.{0........13DC:0040 05 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

99

8. Here is a beautiful example of the use of Macros for realizing hardware functions and using them in an easily understandable way: The example below shows a matrix keyboard as shown in the circuit, with a flow chart for a simple key identification subroutine, along with a program following that flow chart. It should be understood there are different methods possible for interpreting the keys, some with only one key press permitted at a time, and some with multiple keys permitted at a time. The program given is for a simple interpretation of one key pressed at a time.

A Program to identify the key pressed to work on the above hardware, and following the flowchart given: Note how the use of macros simplify the programming

; this subroutine keyid below, does the following:1. wait for clearing of all; previous keys; 2. wait for a new key press. 3. Give a de bounce delay. 4. if ; the key is still remaining pressed, then identify and return with the key; number in reg AH. The procedure assumes a hardware circuit as shown in the; figure above, and a flow chart, also as shown above.; assume cs:code code segment; First we start with macro definitions anykey macro mov al, 00 ;; all rows to be 0's out dx, al ;; dx has address of port A of 8255 add dx,2 in al, dx ;; column in, through port B sub dx,2 and al, 0Fh cmp al, 0Fh

100

endm

;; rows macro patn, num mov ah, num mov al, patn out dx, al add dx,2 in al, dx sub dx,2 and al, 0Fh cmp al, 0Fh jnz colchk ;;key pressed in the row, so check columns endm;; cols macro ;;find the column of the pressed key, and return. local back mov cx, 4back: ror al,1 jnc found dec ah loop back jmp err endm;;;; debounce macro is a standard delay macro using cx as counter;; (20n – 8) clocks of delay will be produced normally by this macro debounce macro n local lup mov cx, n lup: nop loop lup

101

endm strt:call keyfind int 01; keyfind proc near start: anykey jnz start ; some key pressed, so wait for its release, else ; all previous keys are cleared, now look for fresh key. again: anykey jz again ; if no, try again. Else debounce debounce 5000 ; will produce about 20msec delay on a 5 MHz clock. rows 0Eh, 3 rows 0Dh, 7 rows 0Bh, 0Bh rows 07, 0Fh err: stc ; no rows have key pressed, indicate error thro’ CY flag. found: ret colchk: cols ; find the column having the key pressed & return keyfind endp code endsend strt

The List file for the above program as obtained from the assembler MASM: note how the macros are expanded.

;KEYID routine; this subroutine keyid below, does the following:1. wait for clearing of all; previous keys; 2. wait for a new key press. 3. Give a debounce delay. 4. if; the key is still remaining pressed, then identify and return with the key; number in reg AH. The procedure assumes a hardware circuit as shown in the; figure above, and a flow chart, also as shown above.

; assume cs:code

0000 code segment; First we start with macro definitions anykey macro mov al, 00 ;; all rows to be 0's out dx, al ;; dx has address of port A of 8255 add dx,2 in al, dx ;; column in, through port B sub dx,2 and al, 0Fh cmp al, 0Fh endm;; rows macro patn, num mov ah, num mov al, patn out dx, al add dx,2 in al, dx sub dx,2 and al, 0Fh cmp al, 0Fh jnz colchk;;key pressed in the row, so check

;;columns endm

102

;; cols macro local back mov cx, 4back: ror al,1 jnc found dec ah loop back jmp err endm

;;;; debounce macro is a standard delay macro using cx as counter ;; now the subroutine using these macros;

debounce macro n local lup mov cx, n lup: nop loop lup endm

0000 E8 0005 R strt:call keyfind 0003 CD 01 int 01

; 0005 keyfind proc near 0005 start: anykey 0005 B0 00 1 mov al, 00 ; 0007 EE 1 out dx, al ; 0008 83 C2 02 1 add dx,2 000B EC 1 in al, dx ; 000C 83 EA 02 1 sub dx,2 000F 24 0F 1 and al, 0Fh 0011 3C 0F 1 cmp al, 0Fh 0013 75 F0 jnz start 0015 again: anykey 0015 B0 00 1 mov al, 00 ; 0017 EE 1 out dx, al ; 0018 83 C2 02 1 add dx,2 001B EC 1 in al, dx ; 001C 83 EA 02 1 sub dx,2 001F 24 0F 1 and al, 0Fh 0021 3C 0F 1 cmp al, 0Fh 0023 74 F0 jz again

debounce 5000 0025 B9 1388 1 mov cx, 5000 0028 90 1 ??0000: nop 0029 E2 FD 1 loop ??0000

rows 0Eh, 3 002B B4 03 1 mov ah, 3 002D B0 0E 1 mov al, 0Eh 002F EE 1 out dx, al 0030 83 C2 02 1 add dx,2 0033 EC 1 in al, dx 0034 83 EA 02 1 sub dx,2 0037 24 0F 1 and al, 0Fh 0039 3C 0F 1 cmp al, 0Fh 003B 75 38 1 jnz colchk ;

rows 0Dh, 7 003D B4 07 1 mov ah, 7 003F B0 0D 1 mov al, 0Dh 0041 EE 1 out dx, al 0042 83 C2 02 1 add dx,2 0045 EC 1 in al, dx 0046 83 EA 02 1 sub dx,2

103

0049 24 0F 1 and al, 0Fh 004B 3C 0F 1 cmp al, 0Fh 004D 75 26 1 jnz colchk ;

rows 0Bh, 0Bh 004F B4 0B 1 mov ah, 0Bh 0051 B0 0B 1 mov al, 0Bh 0053 EE 1 out dx, al 0054 83 C2 02 1 add dx,2 0057 EC 1 in al, dx 0058 83 EA 02 1 sub dx,2 005B 24 0F 1 and al, 0Fh 005D 3C 0F 1 cmp al, 0Fh 005F 75 14 1 jnz colchk ;

rows 07, 0Fh 0061 B4 0F 1 mov ah, 0Fh 0063 B0 07 1 mov al, 07 0065 EE 1 out dx, al 0066 83 C2 02 1 add dx,2 0069 EC 1 in al, dx 006A 83 EA 02 1 sub dx,2 006D 24 0F 1 and al, 0Fh 006F 3C 0F 1 cmp al, 0Fh 0071 75 02 1 jnz colchk ; 0073 F9 err: stc 0074 C3 found: ret 0075 colchk: cols 0075 B9 0004 1 mov cx, 4 0078 D0 C8 1 ??0001: ror al,1 007A 73 F8 1 jnc found 007C FE CC 1 dec ah 007E E2 F8 1 loop ??0001 0080 EB F1 1 jmp err 0082 keyfind endp 0082 code ends

end strtMacros:

N a m e LinesANYKEY . . . . . . . . . . . . . 7COLS . . . . . . . . . . . . . . 6DEBOUNCE . . . . . . . . . . . . 3ROWS . . . . . . . . . . . . . . 9Segments and Groups: N a m e Length Align Combine ClassCODE . . . . . . . . . . . . . . 0082 PARA NONESymbols: N a m e Type Value AttrAGAIN . . . . . . . . . . . . . L NEAR 0015 CODECOLCHK . . . . . . . . . . . . . L NEAR 0075 CODEERR . . . . . . . . . . . . . . L NEAR 0073 CODEFOUND . . . . . . . . . . . . . L NEAR 0074 CODE

104

KEYFIND . . . . . . . . . . . . N PROC 0005 CODE Length = 007DSTART . . . . . . . . . . . . . L NEAR 0005 CODESTRT . . . . . . . . . . . . . . L NEAR 0000 CODE??0000 . . . . . . . . . . . . . L NEAR 0028 CODE??0001 . . . . . . . . . . . . . L NEAR 0078 CODE@CPU . . . . . . . . . . . . . . TEXT 0101h@FILENAME . . . . . . . . . . . TEXT keyid@VERSION . . . . . . . . . . . . TEXT 510


A note on the Key Debounce operation: The Debounce delay ensures that the previous key being released will not be seen repeated. When the key is released, there are bounces, which open the contacts briefly and then again make them on briefly several times, before finally breaking off. This will appear as repeated pressing of the same key a few times. This illusion is avoided if you find the key once detected still remains detected after a delay that would have caused the vibrations to cease, would mean a new key and not the key bounce. In a similar way, a key being pressed also would appear multi press of the same key, which would be avoided if we consider the key after Debounce delay. Look at the flowchart from this point of view.

ALP Procedures or Sub Routines: The features of the procedures as compared to those of the macros have already been discussed in connection with the discussion of macros. We will now make this comparison clearer, by doing the smart divide as a subroutine. This procedure has been given in the book on microprocessors by Douglas V Hall, which I am giving here with a little modification. I am giving a near procedure (that is, procedure in the same code segment) with just a small modification. A procedure can always be used several times without repeating the sequence of instructions in the ALP and in the machine language (executable) version of the program. It is always a good practice to indicate at the beginning of the procedure, the following by way of comments: (i) Which registers or stack locations or memory locations should have the input variables at the time of calling the procedure; (ii) Where the output variables are located when the procedure returns; (iii) What are the registers used and whose contents are destroyed by the procedure; and more importantly (iv) What the procedure actually does. When this information is provided, the user of the procedure can use the procedure confidently, and arrange to save the contents of any register that is destroyed by the procedure prior to calling the procedure, and retrieve the saved data to the appropriate place after the return from the procedure. So, here is an example of a main program calling a procedure to perform a smart word division.

THE PROGRAMdata segment

105

dta dw 4567h, 0abcdh, 789ah, 1234h, 5678h, 89abhrlt dw 6 dup (?)data ends ; the first 2 words of dta are the dividend words, while the third; word is the divisor. The 4th and 5th words form the dividend for the next; trial and the sixth is the next divisor; the first two words of rlt are for the first quotient and the next; is the remainder. Similarly, fourth, fifth are quotient words of; second division and the sixth is the remainder.

code segment assume cs:code, ds:data, es:data

; we are using macros here to make the data loading and storing simpler movdata macro; load data into registers lodsw mov bx,ax lodsw mov dx,ax lodsw xchg ax,bx endm strlt macro ; store result in memory stosw mov ax,dx stosw mov ax,cx stosw endm start: mov ax, data ; main program starts from here mov ds, ax mov es, ax mov si, offset dta cld ; for string operations lea di, rlt movdata call smart_div strlt movdata call smart_div strlt int 01 smart_div proc near

; the procedure takes the dividend from DX:AX, and divisor from BX, returns; the quotient in DX:AX and the remainder in CX, the divisor is returned; unaltered. CX is used along with DX and AX. All other registers are; returned unaltered. Note the procedure provides no flexibility in the use; of registers. Moreover, this procedure is not useful for byte division.

sub cx, cx cmp dx, bx jb down xchg ax, cx xchg ax, dx div bx xchg ax, cx down: div bx xchg dx, cx ret; smart_div endpcode endsend start

TESTING IN DEBUG-u 0 44

106

13D7:0000 B8D513 MOV AX,13D5 13D7:0003 8ED8 MOV DS,AX 13D7:0005 8EC0 MOV ES,AX 13D7:0007 BE0000 MOV SI,0000 13D7:000A FC CLD 13D7:000B 8D3E0C00 LEA DI,[000C] 13D7:000F AD LODSW 13D7:0010 8BD8 MOV BX,AX 13D7:0012 AD LODSW 13D7:0013 8BD0 MOV DX,AX ; load input 13D7:0015 AD LODSW 13D7:0016 93 XCHG BX,AX 13D7:0017 E81B00 CALL 0035 13D7:001A AB STOSW 13D7:001B 8BC2 MOV AX,DX 13D7:001D AB STOSW ; store result 13D7:001E 8BC1 MOV AX,CX 13D7:0020 AB STOSW 13D7:0021 AD LODSW 13D7:0022 8BD8 MOV BX,AX 13D7:0024 AD LODSW 13D7:0025 8BD0 MOV DX,AX ; load input 13D7:0027 AD LODSW 13D7:0028 93 XCHG BX,AX 13D7:0029 E80900 CALL 0035 13D7:002C AB STOSW 13D7:002D 8BC2 MOV AX,DX 13D7:002F AB STOSW ; store result 13D7:0030 8BC1 MOV AX,CX 13D7:0032 AB STOSW 13D7:0033 CD01 INT 01 13D7:0035 2BC9 SUB CX,CX 13D7:0037 3BD3 CMP DX,BX 13D7:0039 7205 JB 0040 13D7:003B 91 XCHG CX,AX 13D7:003C 92 XCHG DX,AX 13D7:003D F7F3 DIV BX 13D7:003F 91 XCHG CX,AX 13D7:0040 F7F3 DIV BX 13D7:0042 87D1 XCHG DX,CX 13D7:0044 C3 RET -g 17

AX=4567 BX=789A CX=0065 DX=ABCD SP=0000 BP=0000 SI=0006 DI=000C DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=0017 NV UP EI PL NZ NA PO NC 13D7:0017 E81B00 CALL 0035 ; before call -d 0 1713D5:0000 67 45 CD AB 9A 78 34 12-78 56 AB 89 00 00 00 00 gE...x4.xV......13D5:0010 00 00 00 00 00 00 00 00 ;data input in memory ........-g 1a

AX=6CAE BX=789A CX=54BB DX=0001 SP=0000 BP=0000 SI=0006 DI=000C DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=001A OV UP EI PL NZ NA PE NC 13D7:001A AB STOSW ; after return -g 29

AX=1234 BX=89AB CX=54BB DX=5678 SP=0000 BP=0000 SI=000C DI=0012

107

DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=0029 OV UP EI PL NZ NA PE NC 13D7:0029 E80900 CALL 0035 ; before call -g 2c

AX=A0CB BX=89AB CX=079B DX=0000 SP=0000 BP=0000 SI=000C DI=0012 DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=002C OV UP EI NG NZ AC PO CY 13D7:002C AB STOSW ; after return -g 33

AX=079B BX=89AB CX=079B DX=0000 SP=0000 BP=0000 SI=000C DI=0018 DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=0033 OV UP EI NG NZ AC PO CY 13D7:0033 CD01 INT 01 - d 0 1713D5:0000 67 45 CD AB 9A 78 34 12-78 56 AB 89 AE 6C 01 00 gE...x4.xV...l..13D5:0010 BB 54 CB A0 00 00 9B 07 ; input & output data. .T......-q

If the smart divide procedure is compared with the corresponding smart divide macro, one can easily recognize the flexibility provided by the macro. With macro we could undertake byte or word division with a single macro, but we cannot use the procedure indicated above for byte division. We have to write a separate procedure for smart byte division. Not only that, with macros we have the flexibility of using any register to store the divisor, by properly invoking the macro with that register as the parameter, but such a flexibility is not there with procedures. Only one register, BX in the above program, can have the divisor and only one register, CX, can be used to store the remainder.

Passing Parameters to Subroutines: How to specify the parameters in a subroutine? In a macro, we invoke the macro with the parameters specified explicitly. Procedures are simply called without any argument being specified. All the parameters or arguments are implied completely. If the parameters are not many in number, they can be passed through specific registers. But if a relatively large number of parameters, say five or more, are to be sent to the procedure, then registers may be required for manipulations and computations in the subroutines and the parameters will have to be stored possibly as an array in some memory location. Parameters may be required to be handled in a random order in the procedure, and so they are unsuitable for storing in the stack, which is a last-in first-out array. Parameters can be stored in a memory location as an array, whose starting address is specified in an address register. It will then be possible to retrieve any parameter from this array, any number of times, by indexed addressing. When returning from the subroutine, these memory locations should be made free for other uses by the program. Recursive and re-entrant programs may pose further difficulty in parameter passing. A recursive program is one which calls itself. A simple example is a program calculating the factorial of a number. The recursive equation n! = n*(n – 1)! can be used with the terminating condition, 1! = 1. So the procedure for calculating the factorial n, will first check if n = 1. If so, it will return with the value 1 for the factorial, else it will store n in the memory, decrement n and call the subroutine again. The process continues till n gets reduced to 1 when the factorial value of 1 will be returned by the program, this will now be used to calculate 2!, from which 3! will be computed and so on, till we are able to compute and return the value of n!. A re-entrant

108

program is one which starts executing for one set of parameters and part way through, it is called again to repeat the computation for another set of parameters. This sort of requirement may come about as follows. Consider a floating point add subroutine. A program is doing this process. In the middle of this execution, the processor is interrupted by some system hardware. As per the interrupt handling operations, the interrupt service routine starts executing now, putting the on-going floating point ADD routine in a suspended condition. If the interrupt service routine now also requires floating point ADD operation, it will call the same procedure, but with a different set of parameters to be handled, as per the requirements of the interrupt service process. The floating point ADD routine is now said to have re-entered with different parameters. On return from interrupt service, the suspended floating point ADD routine should resume operation from where it has left, that means, its parameters should not be disturbed by the re-entered procedure. Many system programs do require being re-entrant.

Both recursion and re-entrance require different non-overlapping locations for the parameters every time the procedures are called. There are several possibilities of achieving all the above requirements of random access of the parameters and non-overlapping region of memory for every new invocation of the routines. It may sometimes be possible to avoid use of parameter overlapping by suitably adjusting the recursive equation, as the example below shows; or if required, using the stack for temporarily storing the parameters as shown in the next example, below.

EXAMPLE 1; computing n! without using the stack for parameter storeA RECURSIVE PROGRAM FOR FACTORIAL; PASSING PARAMETERS IN REGISTERS.The program below uses the recursive equation, n! = n*(n-1)!, with the terminating condition defined for 1! = 0! = 1. The parameters passed are: the value of n in BX, and the identity element for multiplication, namely 1, is stored in register AX as a parameter to be passed to the subroutine.code segmentassume cs:code strt: mov ax, 1 ; mov bx, 8 ; parameters to be passed in ax and bx; n = 8 call fa int 1 fa proc near cmp bx,1 jna return mul bx ; multiplication is done first, so no need to store n dec bx call fareturn: ret fa endp code ends end strt

-u 0 1613DC:0000 B80100 MOV AX,0001

109

13DC:0003 BB0800 MOV BX,0008 13DC:0006 E80200 CALL 000B 13DC:0009 CD01 INT 01 13DC:000B 83FB01 CMP BX,+01 13DC:000E 7606 JBE 0016 13DC:0010 F7E3 MUL BX 13DC:0012 4B DEC BX 13DC:0013 E8F5FF CALL 000B 13DC:0016 C3 RET -rAX=0000 BX=0000 CX=0017 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=0000 NV UP EI PL NZ NA PO NC 13DC:0000 B80100 MOV AX,0001 -g

AX=9D80 BX=0001 CX=0017 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=000B NV UP EI PL ZR NA PE NC 13DC:000B 83FB01 CMP BX,+01 -qEXAMPLE 2; using the stack for temporary parameter store; the program below uses the recursion slightly differently; for passing parameters through registers. In this; you pass input parameter n through reg BX; output in ax.; registers used: ax, bx and dx (for word multiplication); this program uses the recursive equation n! = (n-1)!*n. ; the terminating condition is 1! = 0! = 1; there is no; need to save this condition in a register as in the earlier; program. However, this program needs the use of the stack; to store the value of the parameter n in the stack.; Reason out why. assume cs:code code segment strt:call fac int 1 jmp strt fac proc near mov ax, 1 cmp ax, bx jae return push bx ;store n temporarily (till ‘ret’ from next ‘call’) dec bx call fac pop bx ;retrieve the stored n for multiplication mul bx ;multiply by n. return: ret fac endp code ends end strt-u 0 1613DB:0000 E80400 CALL 0007 13DB:0003 CD01 INT 01 13DB:0005 EBF9 JMP 0000 13DB:0007 B80100 MOV AX,0001 13DB:000A 3BC3 CMP AX,BX 13DB:000C 7308 JNB 0016 13DB:000E 53 PUSH BX

110

13DB:000F 4B DEC BX 13DB:0010 E8F4FF CALL 0007 13DB:0013 5B POP BX 13DB:0014 F7E3 MUL BX 13DB:0016 C3 RET -rAX=0000 BX=0000 CX=0017 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13CB ES=13CB SS=13DB CS=13DB IP=0000 NV UP EI PL NZ NA PO NC 13DB:0000 E80400 CALL 0007 -rbxBX 0000:8-g

AX=9D80 BX=0008 CX=0017 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13CB ES=13CB SS=13DB CS=13DB IP=0005 NV UP EI PL ZR NA PE NC

13DB:0005 EBF9 JMP 0000

In the example 1 above, we can notice an interesting feature. In the subroutine there, the instruction CALL FA is immediately followed by the instruction RET (with a label RETURN). Such CALL followed immediately by RET can always be replaced by a simple JMP instruction. You may reason it out and make sure that it is so. If we do that change, we get the FA subroutine altered at the end thus:

JMP FARETURN: RET.In this fashion, it is no more a recursive routine, and the working of the program

is shown below.

-u 0 1513DC:0000 B80100 MOV AX,0001 13DC:0003 BB0800 MOV BX,0008 13DC:0006 E80200 CALL 000B 13DC:0009 CD01 INT 01 13DC:000B 83FB01 CMP BX,+01 13DC:000E 7605 JBE 0015 13DC:0010 F7E3 MUL BX 13DC:0012 4B DEC BX 13DC:0013 EBF6 JMP 000B 13DC:0015 C3 RET -rAX=0000 BX=0000 CX=0016 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=0000 NV UP EI PL NZ NA PO NC 13DC:0000 B80100 MOV AX,0001 -gAX=9D80 BX=0001 CX=0016 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=000B NV UP EI PL ZR NA PE NC 13DC:000B 83FB01 CMP BX,+01

Passing of the parameters through the registers and then using the stack to keep the parameters temporarily, so that the overlapping of parameters from one call to another will not erase the parameters across the calls, we have seen above. However, the most common and versatile method that can be used in the 8086 processor for recursion, is

111

passing parameters directly through the stack, instead of using the stack to store parameters in the subroutines. This is the standard method used by the C-compiler, for example, for any general subroutine handling. The stack has a limitation, of course. It does not allow accessing the parameters randomly as would be required by the operations of the program. To make a random access possible, a separate register, other than the stack pointer is provided. This is the BP or the base pointer. Normally base pointer defaults with the stack segment for the reason of making a parameter array in the stack for the subroutines. The main or calling program pushes the parameters onto the stack; the subroutine accesses the parameters randomly as required, using the BP register. On return, the calling program could retrieve the results through pop operations from the stack array. We make a separate structure (in the stack) called the stack frame, and put our parameters to be passed in this stack frame, including the output desired from the subroutine. In order to do this, we provide first, space for the output variable, by subtracting enough number from the stack pointer. Then we push the input variables. Having done this in the main program, we call the subroutine. In the subroutine, the first thing we do is to get the BP pushed onto the stack and copy the SP value in BP. BP now becomes the frame pointer; the space starting from the return address, down to, and including the output space in the stack, will be the stack frame. The frame could be further expanded by providing space for the local variables of the subroutine, which may have to be referred to, a number of times. This space is provided by subtracting appropriate number from the stack pointer. Space above this in the stack is now available for use in the subroutine as a regular stack. This separation of stack space into frame and stack, will give a disciplined approach to the parameter passing problem of subroutines. According to this, the recursive subroutine for factorial will be as shown. Note that the stack frame and the input and output parameters are referred to in the subroutine, by indexed addressing using BP with positive displacement, while the local parameters are with negative displacement. Stack beyond the frame is available to the subroutine to be used like an ordinary stack with the LIFO operation. While returning, the process simply does move to SP from BP and then pops BP, to return to the old BP, and then executes ret n, (in the program shown, return alone is used, instead of return n, which is followed by add SP, 2, which is another way of doing it), where n is the number of input bytes to be discarded from the stack. Now the output of the subroutine can be simply popped off the stack in the main program. In effect, the parameters are pushed in the calling program, and recalled using BP relative addressing in the called program. On return, the results can be popped off in the main program. Based on this philosophy, the recursive factorial program can be seen to be as follows:

Details of Passing Parameters to a Subroutine Using Stack Arrays or Stack Frames.

Code segmentAssume cs:code Main: Mov AX, n ; (Choose n in the range 0 to 8 only.)

Sub SP,2 ; make space for one word output (Factorial value)Push AX ; input parameter to the stack.Call Fact ; call the recursive routine.Add SP,2 ; clear the stack of the input

; to undo the Push AX above Pop AX ; get the output result in reg. AX.Int 01 ; pass control to the DOS.

112

Fact proc near ; the recursive procedure here. Push BP ; BP is the frame pointer (defaulting to Stack Segment)

; so, save its old value belonging to the calling ; process, and make space to have its new value. Mov BP,SP ; BP is now the (new) frame pointer for the subroutine.

; no temporary variable required, so no Dec SP needed; in this subroutine.

Push AX ; stack space of subroutine utilized to save Regs. used. Push BX ; Push DX ; save Regs. used. Mov BX, [BP+4]; the input parameter, n, is at [BP+4]; see table 4.1 Mov AX,1 ; prepare for checking termination condition. Cmp AX, BX ; check for termination. Jae Term ; if AX is above or equal to BX, that is, if n = 0 or 1 go

; to terminate; the result is already in the register AX. Dec BX ; else, prepare to recursively call Fact (n-1) Sub SP,2 ; make space for output of the procedure for Fact (n-1). Push BX ; input to the procedure to find Fact (n-1). Call Fact ; recursive call. Add SP,2 ; discard the input variable of the called procedure. Pop AX ; get Fact (n-1) in AX, output of called procedure. Mov BX,[BP+4] ; get the input variable to this procedure, n, in BX. Mul BX ; the product n* Fact (n-1) goes to DX:AX; note, DX will

; be zero here as we are limiting the result to 16-bits.; however, DX is set to 0, by this instruction, and hence ; we need to save it across the procedure; result now in AX.

Term:Mov [BP+6], AX ; store the result from AX in the output space provided in ; the Stack Frame. Note, in case n is 1 or 0, we directly ; come here with the result 1, in AX.

Pop DX ; termination ritual starts from here, retrieve saved ; registers, Pop BX ; from the stack, Pop AX ; and clean the Procedure stack. Mov SP, BP ; not really necessary here, as the stack is already clean.

; but it is a good practice to have this in the Procedure; to make it doubly sure that the stack is cleaned.

Pop BP ; get back the original BP of the calling program ret ; Procedure over, go back to the calling program.fact endpcode endsend main

Intel 8086 instruction, RET n, which does the job of adding the value n to the stack pointer after the return, meant to be used in this situation. If we use Ret 2 instead of Ret in the above program, the instruction Add SP, 2 following the call instruction in both the main program and in the subroutine procedure can be eliminated. The table below shows the stack frame for this program.

Table 4.1: Stack Frame after the 2nd Instruction of the Procedure

Memory Address Pointer Memory ContentsNew BP = SP Old BP of calling programNew BP + 2 Return address of calling Program

(for a near call)New BP + 4 Input to the called subroutine, nNew BP + 6 Space for output value of Fact

113

It may be noted here, that there is no local variable required for this subroutine, so, there is nothing in the stack frame above the old BP. In case there are local variables stored, above the old BP in the stack frame, the instruction mov SP,BP in the termination part of the procedure, would clear the stack of those variables. Hence as a general rule it is safe to use that instruction during the termination process. The space above is usable in the routine as a normal stack, for saving registers, and for further call of nested routines, etc. The stack frame as shown in this example, remains as compact as possible, and for determining the offsets for variable parameters we need to consider only one frame, without bothering about the nesting frames.

The Table 4.2 explains the operation of passing parameters through the use of stack frame step by step.

However, re-entrant programs have no way other than going through a process similar to what we discussed with recursion. A stack frame is the best way of meeting the requirements of the re-entrant programs. For every entry there is a stack frame created which preserves the parameters, as well as the local variables of the procedure, which will be separate (non overlapping memory region) from the next instance of the call to the same procedure.

Table 4.2 : Passing parameters through stack frames to subroutines

114

It should however be noted that the program we have taken, uses only two or three registers and hence, parameters can well be passed through registers in this case. Below, we have a recursive program for factorial calculation using this idea.

Calling program Called Sub routineStep 1: Decrement Stack pointer suitably to accommodate result outputStep 2: Push all the input parameters.Step 3: Call the Subroutine →

Step 4: On return from the subroutine, clear the parameters input to the subroutine from the Stack by incrementing the Stack pointer appropriately, to undo step 2 above.Step 5: Get the results of the subroutine by popping them off the stack into registers or memory locations as required, and use them as required.

Step 1: Push the Frame pointer.Step 2: Move SP to Frame pointer.Step 3: Decrement the stack pointer suitably to provide space for temporary or local variables of the subroutine (which may be required to be invoked in a random fashion) to complete the stack frame.Step 4: Do the subroutine job, use the stack space above the frame, as the stack for the subroutine. Use indexed addressing with the Frame pointer as the base register to obtain the parameters as required in the subroutine. Use indexed addressing with the frame pointer to store the output variables of the subroutine in the space provided in the Stack frame.Step 5: When the subroutine job is done, clear the subroutine stack by moving the Frame pointer value to the stack pointer. Get back the original Frame pointer.Step 6: Return to the calling program

←

115

Fig 4.1: Flowchart for the recursive Factorial Procedure for N!

Example: write a recursive program to compute nCr (passing parameters through

the stack), number of combinations of n items taking r at a time. Use the relations: nCr = n-1Cr + n-1Cr-1 with the terminating conditions: nCn = nC0 = 1

SOLUTION: A single stacked word is used to define input parameters n and r. The parameters n = 18 or 12 hex is in the high byte of the word and r = 9 in the low byte. The input is chosen to limit the output to 16 bits. One word space is left for input in the stack frame, by the instruction sub SP,2 on line no. 2. Try to understand how the program works with the help of the comments given.

Note the format of this program is not for assembling using the MASM. It is the .lst program obtained from a 32 bit assembler, NASM, downloadable from the net. However, the difference is not much, and the .asm version required for MASM can be easily visualized from this listing. See also Q5 at the end of chapter exercises for a brief introduction to NASM, and macros in NASM. 1 ; Main Program for finding nCr recursively. 2 00000000 81EC0200 sub sp,2 3 00000004 B80912 mov ax,1209h ; n = 12h or 18, and r = 9. 4 00000007 50 push ax 5 00000007 E80300 call ncrp ; call to the routine at line 8. 6 0000000B 58 pop ax ; get result in ax.

7 0000000C CD01 int 01 ; pass control to DOS. ; procedure ncrp 8 0000000E 55 ncrp:push bp ; recursive routine here. 9 0000000F 89E5 mov bp,sp 10 00000011 81EC0200 sub sp,2 ; space for temp variable ; of the procedure.

No

Yes

CREATESTAC KFRAME

PRO C FACT

N ≤ 1?

N = N – 1

FAC T = 1

RETURN

UNDO THE

STACK FRAME

CALL FACT (N – 1)!

FACT = N*FAC T

116

11 00000015 50 push ax 12 00000016 53 push bx 13 00000017 8B5E04 mov bx,[bp+4] ; parameters passed from ; the calling program. 14 0000001A B80100 mov ax,1 15 0000001D 38DF cmp bh,bl 16 0000001F 7427 jz over 17 00000021 08DB or bl,bl ; is bl = 0? 18 00000023 7423 jz over 19 00000025 81EC0200 sub sp,2 20 00000029 FECF dec bh 21 0000002B 53 push bx 22 0000002C E8DFFF call ncrp ; calculate n-1Cr 23 0000002F 5B pop bx 24 00000030 895EFE mov [bp-2],bx ; store partial result ; temporarily. 25 00000033 8B5E04 mov bx,[bp+4] ; recall the parameters 26 00000036 FECF dec bh 27 00000038 FECB dec bl 28 0000003A 81EC0200 sub sp,2 29 0000003E 53 push bx 30 0000003F E8CCFF call ncrp ; compute n-1Cr-1 31 00000042 58 pop ax 32 00000043 8B5EFE mov bx, [bp-2] 33 ; partial result stores in ; temp variable space of ; stack frame 34 00000046 01D8 add ax, bx 35 00000048 894606 over:mov [bp+6],ax ; store result in stack space 36 0000004B 5B pop bx 37 0000004C 58 pop ax 38 0000004D 89EC mov sp,bp ; clean the stack 39 0000004F 5D pop bp 40 00000050 C20200 ret 2

The subroutine leaves space for one word of local variable (sub sp,2 in line no. 10). Sub sp,2 in line 28 is the space left for the output variable. Lines 15 to 18 check for termination condition. The rest of the program can be clearly identified in terms of the Flow chart of Fig. 4.1.

For counting the number of times the subroutine is called, you can use SI:CX as counters set to 0, initially in the main program, and incremented before the return instruction in the subroutine ncrp. You may be surprised to see the result! It comes to as much as 97239 decimal, whereas the value 18C9 is only 48620 decimal.

The use of an additional set of terminal conditions, namely, nCn-1 = nC1 = n, will certainly improve the execution time and stack memory requirement (the count here comes to only 25836 decimal).

In normal non-recursive and non-re-entrant situations, it is possible to pass parameters through registers. The following is an example of a non-recursive program using registers for parameter passing. The program is written as a near program, and does a 32 bit by 32 bit multiplication. The program is given below with adequate comments.

An Example of passing parameters to subroutines through Registers;The process in terms of 16 bit data (a:b)*(c:d) = acH : (acL + bcH + adH) : ; (bcL + adL + bdH) : bdL

117

; the numbers to be multiplied are first made available in the registers as; indicated in the comments at the start of the procedure.; Steps are: 1. Save registers used; 2. Do the job and 3. Retrieve saved regs.code segmentassume cs:code start: call dmult int 01 dmult proc near; input parameters, a in dx, b in ax, c in bx and d in cx; numbers multiplied a:b and c:d; i.e. (dx:ax)*(bx:cx).; output in dx:cx:bx:ax ; all other regs. are saved across the procedure push si push di push bp ; save extra registers used mov si, ax ; dx:ax has a:b, so b si mov di, dx ; a di mul cx ; (bd) dx:ax xchg ax, si; si bdL, final; and b ax mov bp, dx ; bdH bp mul bx ; (bc) dx:ax add bp, ax ; bcL + bdH bp, carry1 is not disturbed & used later mov ax, cx ; d ax mov cx, 0 ; 0 cx without disturbing the carry flag adc cx, dx ; bcH + carry1 cx mul di ; (ad) dx:ax add bp, ax ; bcL + bdH + adL bp; carry2 could be there adc cx, dx ; bcH + carry1 + adH + carry2 cx; carry3 could be there mov ax, di ; a ax mov di, 0 ; carry3 not disturbed, 0 di adc di, di ; carry3 di mul bx ; (ac) dx:ax add cx, ax ; bcH + carry1 + adH + carry2 + acL cx; carry4, may be ; cx now has the final result adc dx, di ; acH + carry3 + carry4 dx mov bx, bp ; arrange the results in bx mov ax, si ; and in ax as required pop bp ; retrieve saved registers pop di pop si ret dmult endp code ends end start

Non recursively solving problems which are susceptible to recursive solution:

Recursive problems may also be solved non recursively. For example, factorial computation for n can be done by simply starting from 1 and keeping multiplying by successive integers until we reach n, alternatively, starting with n decrementing and multiplying until we reach 1. A skeleton program for the purpose is shown below:-a1377:0100 mov ax,11377:0103 mov cx,81377:0106 mul cx1377:0108 loop 1061377:010A int 01-r

118

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC 1377:0100 B80100 MOV AX,0001 -t12

AX=0001 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PO NC 1377:0103 B90800 MOV CX,0008 AX=0001 BX=0000 CX=0008 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 F7E1 MUL CX AX=0008 BX=0000 CX=0008 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 E2FC LOOP 0106 AX=0008 BX=0000 CX=0007 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 F7E1 MUL CX AX=0038 BX=0000 CX=0007 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 E2FC LOOP 0106 AX=0038 BX=0000 CX=0006 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 F7E1 MUL CX AX=0150 BX=0000 CX=0006 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 E2FC LOOP 0106 AX=0150 BX=0000 CX=0005 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 F7E1 MUL CX AX=0690 BX=0000 CX=0005 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 E2FC LOOP 0106 AX=0690 BX=0000 CX=0004 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 F7E1 MUL CX AX=1A40 BX=0000 CX=0004 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 E2FC LOOP 0106 AX=1A40 BX=0000 CX=0003 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 F7E1 MUL CX AX=4EC0 BX=0000 CX=0003 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 E2FC LOOP 0106 AX=4EC0 BX=0000 CX=0002 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 F7E1 MUL CX

119

AX=9D80 BX=0000 CX=0002 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 E2FC LOOP 0106 AX=9D80 BX=0000 CX=0001 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PO NC 1377:0106 F7E1 MUL CX AX=9D80 BX=0000 CX=0001 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC 1377:0108 E2FC LOOP 0106 AX=9D80 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1377 ES=1377 SS=1377 CS=1377 IP=010A NV UP EI PL NZ NA PO NC 1377:010A CD01 INT 01 -q

The advantage of considering recursion is that the recursion logic is normally relatively simple. In case the non recursive logic is simple, it is better to solve a problem non recursively. Certain problems are difficult to visualize without recursion. Tower of Hanoi is a very good example of such a problem. With recursion, it is very simple to tackle. In such cases it becomes necessary to resort to recursion.

EXERCISES

1. Study the method of handling local labels in macro as demonstrated in section 1 on macros at the beginning of this chapter. Based on your study, answer the following question: A program uses 2 macros. Macro 1 has three local variables, while macro 2 has five. The main program invokes the macro 1 5000 times. How many times can the program invoke macro 2 without getting into problems with identifiers for the local labels?

2. Here is another example of the power of macros to do different operations depending on the operations also indicated as parameter. Study the program carefully and note how the different issues of the shift left and shift right of multi word data by variable number of bits are handled in the macro.

data segment dat dw 234h, 5678h, 89abh, 7604h, 0abc0h, 3 dup(0) dat1 dw 8 dup(?) dat2 dw 8 dup(?) num dw 5data ends;code segment assume cs: code, ds: data, es: data sflr macro opr, mem, n, m, df local bak, again df ;; either 'std' for right shift ;; or 'cld' for left shift

mov dx, m again: mov cx, n lea di, mem

120

clc bak: mov ax, [di]

opr ax, 1 stosw loop bak dec dx jnz again

endm;; In the above macro, n is the no.of data words, and m is the number of bits of; shifts done in terms of bits, opr is rcl or rcr; strt: mov ax, data mov ds, ax

mov es, ax cld mov cx, num mov di, offset dat1 mov si, offset dat rep movsw ; to copy data mov cx, num mov di, offset dat2 mov si, offset dat rep movsw ; to copy data sflr rcl, dat1, num, 4, cld ; data start address for 'rcl' sflr rcr, dat2+8, num, 4, std ; data end address for 'rcr' int 1 code ends end strtSTUDY IN DEBUG-u 0 4e13E0:0000 B8DC13 MOV AX,13DC 13E0:0003 8ED8 MOV DS,AX 13E0:0005 8EC0 MOV ES,AX 13E0:0007 FC CLD 13E0:0008 8B0E3000 MOV CX,[0030] 13E0:000C BF1000 MOV DI,0010 13E0:000F BE0000 MOV SI,0000 13E0:0012 F3 REPZ 13E0:0013 A5 MOVSW 13E0:0014 8B0E3000 MOV CX,[0030] 13E0:0018 BF2000 MOV DI,0020 13E0:001B BE0000 MOV SI,0000 13E0:001E F3 REPZ 13E0:001F A5 MOVSW 13E0:0020 FC CLD 13E0:0021 BA0400 MOV DX,0004 13E0:0024 8B0E3000 MOV CX,[0030] 13E0:0028 8D3E1000 LEA DI,[0010] 13E0:002C F8 CLC 13E0:002D 8B05 MOV AX,[DI] 13E0:002F D1D0 RCL AX,1 13E0:0031 AB STOSW 13E0:0032 E2F9 LOOP 002D 13E0:0034 4A DEC DX 13E0:0035 75ED JNZ 0024 13E0:0037 FD STD 13E0:0038 BA0400 MOV DX,0004 13E0:003B 8B0E3000 MOV CX,[0030] 13E0:003F 8D3E2800 LEA DI,[0028] 13E0:0043 F8 CLC

121

13E0:0044 8B05 MOV AX,[DI] 13E0:0046 D1D8 RCR AX,1 13E0:0048 AB STOSW 13E0:0049 E2F9 LOOP 0044 13E0:004B 4A DEC DX 13E0:004C 75ED JNZ 003B 13E0:004E CD01 INT 01 -gAX=8023 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=000A DI=001E DS=13DC ES=13DC SS=13DC CS=13E0 IP=0050 NV DN EI PL ZR NA PE NC 13E0:0050 FFFF ??? DI -d0 3f13DC:0000 34 02 78 56 AB 89 04 76-C0 AB 00 00 00 00 00 00 4.xV...v........13DC:0010 40 23 80 67 B5 9A 48 60-07 BC 00 00 00 00 00 00 @#.g..H`........13DC:0020 23 80 67 B5 9A 48 60 07-BC 0A 00 00 00 00 00 00 #.g..H`.........13DC:0030 05 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................;; the original word is: ABC0 7604 89AB 5678 0234;; on 4 bit left shift, we get: BC07 6048 9AB5 6780 2340; ; and on right shift, we get: 0ABC 0760 489A B567 8023; as can be seen from ; the result above.

3. In the above program of Q2, the responsibility of the programmer in regard to invoking the macro is a bit complex. If the opr chosen is RCL, then the address of the l. s. word of data is to be given as mem and CLD is to be given as the df parameter. If the opr is chosen as RCR, then the mem should point to the m.s. word address of the data and df should be defined as STD. These have to be coordinated properly; else the program will not work. Can you suggest a method to simplify the coordination requirement? One way of doing it is suggested below, study the method and determine how it will work. Suggest if you have any alternatives.

Sflr macro opr, k, mem, n, m Local bak, againMov bx, nMov dx, mMov si, offset memCld ;; unless changed for RCRMov ax, kOr ax, axJz againStd ;; for RCR operation.Mov cx, bxDec cxShl cx, 1Add si, cx ;; this works out to the m.s. word address for RCR op.

Again: mov cx, bxMov di, siclc

Bak: Mov ax, [di];; from now on, proceed identical to the original program.

In this program, we bundle the parameters (opr, k) as (rcl, 0) for left shift or as (rcr, 1) for right shift, which is easier on the programmer. The mem value can be the start address in both the cases, and the direction flag is adjusted based on the control parameter

122

k. Sometimes the value of n may not be known at the time of writing the program; it may be a computed and stored word during operation. Such cases will also be accommodated in this modified macro.

4. The following program is given without any comments. Find out what it does and test the working of the program.

data segmentmnd dw 8 dup (1578h)spare dw 0F8h dup (?)sud dw 1234h, 0abcdh, 2389h, 5874h dw 9876h, 4567h, 0bcdh dw 9567hspare2 dw 10h dup (?) rest dw 9 dup (?) data endscode segment assume cs:code, ds:data, es:data subt macro minuend, subtrahend, result, n local lup mov si, offset minuend mov di, offset result mov cx, n clc cld

lup: lodsw sbb ax, [subtrahend-minuend-2][si] stosw loop lup sbb ax, ax mov [di], ax endmstrt: mov ax, data mov ds, ax mov es, ax subt mnd, sud, rest, 8 int 1 code ends end strt

5. Below is given an assembly language version of the smart divide using macro, which we saw under point no.6 in this chapter earlier. This version can be assembled using the freely available assembler, NASM (Net Assembler), which can be directly downloaded from the net. The user manual is also freely downloadable. The program as given here matches the program for assembly by MASM, which we saw earlier. Note the features of using macros for NASM and the overall simplicity of the .asm program. List the differences in assembly programs for the two assemblers; especially find the interesting way of handling the parameters for the micro. Note also, how the local labels for the macro are handled.

THE SMARTDIV.ASM FILE

123

%macro smdiv 4 sub %4, %4 cmp %1, %3 jb %%down xchg %4, %1 xchg %4, %2 div %3 xchg %4, %2 %%down: div %3 xchg %4, %1 %endmacro ; start: mov dx, 0x0abc mov ax, 0x1234 mov bx, 0x45ab smdiv dx, ax, bx, cx mov bx, 0xdabc mov ax, 0x2345 mov cl, 35 smdiv ah, al, cl, bl int 01THE SMARTDIV.LST FILE OBTAINED FROM THE SMARTDIV.ASM FILEThe command used is: nasm –l smartdiv.lst smartdiv.asm. The file also shows the assembled program starting at the origin 0000 in the code segment.

1 %macro smdiv 4 2 sub %4, %4 3 cmp %1, %3 4 jb %%down 5 xchg %4, %1 6 xchg %4, %2 7 div %3 8 xchg %4, %2 9 %%down: div %3 10 xchg %4, %1 11 %endmacro 12 ; 13 00000000 BABC0A start: mov dx, 0x0abc 14 00000003 B83412 mov ax, 0x1234 15 00000006 BBAB45 mov bx, 0x45ab 16 smdiv dx, ax, bx, cx 17 00000009 29C9 <1> sub %4, %4 18 0000000B 39DA <1> cmp %1, %3 19 0000000D 7206 <1> jb %%down 20 0000000F 87CA <1> xchg %4, %1 21 00000011 91 <1> xchg %4, %2 22 00000012 F7F3 <1> div %3 23 00000014 91 <1> xchg %4, %2 24 00000015 F7F3 <1> %%down: div %3 25 00000017 87CA <1> xchg %4, %1 26 00000019 BBBCDA mov bx, 0xdabc 27 0000001C B84523 mov ax, 0x2345 28 0000001F B123 mov cl, 35 29 smdiv ah, al, cl, bl

124

30 00000021 28DB <1> sub %4, %4 31 00000023 38CC <1> cmp %1, %3 32 00000025 7208 <1> jb %%down 33 00000027 86DC <1> xchg %4, %1 34 00000029 86D8 <1> xchg %4, %2 35 0000002B F6F1 <1> div %3 36 0000002D 86D8 <1> xchg %4, %2 37 0000002F F6F1 <1> %%down: div %3 38 00000031 86DC <1> xchg %4, %1 39 00000033 CD01 int 01 40 The program can be assembled using the command: nasm –o smartdiv.com smartdiv.asm. The resulting machine language program could be executed in debug. But then it will be at the offset 100h in the debug environment as shown below. The identical fashion in which the two machine language programs (what we saw earlier and what we see here) have developed can be easily verified. -u 100 13313CC:0100 BABC0A MOV DX,0ABC 13CC:0103 B83412 MOV AX,1234 13CC:0106 BBAB45 MOV BX,45AB 13CC:0109 29C9 SUB CX,CX 13CC:010B 39DA CMP DX,BX 13CC:010D 7206 JB 0115 13CC:010F 87CA XCHG CX,DX 13CC:0111 91 XCHG CX,AX 13CC:0112 F7F3 DIV BX 13CC:0114 91 XCHG CX,AX 13CC:0115 F7F3 DIV BX 13CC:0117 87CA XCHG CX,DX 13CC:0119 BBBCDA MOV BX,DABC 13CC:011C B84523 MOV AX,2345 13CC:011F B123 MOV CL,23 13CC:0121 28DB SUB BL,BL 13CC:0123 38CC CMP AH,CL 13CC:0125 7208 JB 012F 13CC:0127 86DC XCHG BL,AH 13CC:0129 86D8 XCHG BL,AL 13CC:012B F6F1 DIV CL 13CC:012D 86D8 XCHG BL,AL 13CC:012F F6F1 DIV CL 13CC:0131 86DC XCHG BL,AH 13CC:0133 CD01 INT 01


125

5. SOME SIMPLE NUMBER CRUNCHING and INTERRUPT PROGRAMS

Having studied the basics of programming in the previous Chapters, we shall now look into some simple number-crunching programs. Many of these routines could be of general use. In such cases they can be converted as procedures. The first program for finding the GCD is directly written as a procedure. You can develop your own method of testing it.

The science of programming lies in making non-working programs work. Problems could arise in the ALP itself which will be indicated while assembling. Many times, the error indication by the assembler may be difficult to understand. The error is tersely noted as Error no. xxxx, with a short description. Some times these descriptions may even be misleading to the uninitiated. To give a simple example, suppose in the data segment you are trying to define the dividend with a word of value ABCD H, by labeling it as dd, as shown below:; the ALP try.asm data segment dd dw abcdh ; line 2 of try.asm ; the intention is to have the symbol ‘dd’ ; defined by the word abcdh. dd(symbol) dw(defining word) abcdhdata endscode segment assume cs:code, ds: data mov ax, data mov ds, ax mov ax, dd ; line 8 of try.asm int 1 code ends end ; Result of assembling try.asm using MASM Microsoft (R) Macro Assembler Version 5.10A Copyright (C) Microsoft Corp 1981, 1989. All rights reserved.try.ASM(2): error A2009: Symbol not defined: DWtry.ASM(8): error A2009: Symbol not defined: DD 48212 Bytes symbol space free 0 Warning Errors 2 Severe Errors

What has happened is the MASM has understood ‘dd’ of line 2 as reserved word defining double word (32-bit word) and is expecting to be getting the double word defined next. What it sees is ‘dw’ and interprets it as a label defined elsewhere. No such label is found and the matter is reported as an error. This will be difficult to make out by the inexperienced user. The user would like to get an indication that the error is in the use of the reserved word ‘dd’ as a symbol for a data word. Instead, the MASM interprets the first word as a valid reserved word and hence looks at the second word as an undefined symbol. If you correct this error by using the symbol dvd for the dividend, you will find the line 2 still having error. Try this out in the laboratory, until you get the program assembling without error! However, most of the time, the error indications can be easily understood. When it is difficult to understand, one will have to work different possible alternatives on the erroneous line and on other lines related to it till the fault is correctly

126

identified. It would require some practice before these aspects are properly understood. On getting the machine language program using the LINK after the MASM, the program can be debugged. At the debug stage also, there could be, or quite likely there will be problems, which have to be solved using the ‘t’, ‘g’, ‘p’, ‘d’ and other commands of the debug judiciously in assessing the faults. The procedures given below are tested and the working test results are shown. Still when you work this problem in the laboratory, there could be errors in your program entry. Until you get solid working programs from possibly wrong programs, you would not have learnt programming well. The purpose of the microprocessor laboratory is to impart this sort of training. Here are a few working procedures and programs. In the laboratory when you are working in a team, it could be arranged that one of the team members deliberately introduces errors in the assembly or debug version of the program, unseen by other team members who may then try to identify the error. This may be played as a game and slowly you will find your interest in programming picking up. In any case whenever you notice errors in your program, make a record of the error and the way you got it corrected. The learning of programming in the laboratory is only by studying and avoiding such errors in the future and getting confident about handling errors by getting aware of common errors possible.

1. A procedure for finding the GCD of two 16 bit numbers in AX and BX registers: The procedure will assume the numbers as unsigned; will test for a zero in any input data and return with the carry flag set to indicate invalid data in case a zero is found as one or both of the inputs. For valid data, the GCD will be returned in register AX, with the carry cleared to indicate valid data.

Program for finding the GCD of two 16 bit numbers; Input: numbers in ax and bx registers. Output in ax register, when carry is ; returned clear. If carry is returned set, it indicates either one or both of the ; input data are zeros. The program uses dx register in addition to the input ; registers ax and bx. Rest of the registers are left intact. GCD proc near

Cmp ax, bxJae downXchg ax, bx ;

Down: or bx, bx ; bx will now be the smaller of the two dataJz invalid ; if it is zero the data is invalid

Push dx ; else, save register used and ; Note, Push is done only for valid dataAgain:Sub dx, dx ; prepare for word division

Div bx ; No divide overflow possible as dividend itself is 16-bitsMov ax, bxMov bx, dxOr dx,dx ; note this will clear the carryJnz againPop dxRet

Invalid:StcRet

Testing in Debug-u 0 1e

127

13D5:0000 E80200 CALL 0005 13D5:0003 CD01 INT 01 13D5:0005 3BC3 CMP AX,BX ; procedure from here; Euclid’s algo. 13D5:0007 7301 JNB 000A 13D5:0009 93 XCHG BX,AX 13D5:000A 0BDB OR BX,BX ; bx carries the smaller number here. 13D5:000C 740F JZ 001D ; if bx = 0, invalid data 13D5:000E 52 PUSH DX ; save register used 13D5:000F 2BD2 SUB DX,DX ; prepare for word division 13D5:0011 F7F3 DIV BX 13D5:0013 8BC3 MOV AX,BX 13D5:0015 8BDA MOV BX,DX 13D5:0017 0BD2 OR DX,DX ; remainder = 0?; this also clears carry 13D5:0019 75F4 JNZ 000F ; if so, job over, gcd is in AX 13D5:001B 5A POP DX ; retrieve saved DX before return 13D5:001C C3 RET 13D5:001D F9 STC ; DX not saved and not used in this path 13D5:001E C3 RET Test on normal data-raxAX 0000:1234-rbxBX 0000:1324-rdxDX 0000:1111 ; data used to test if it is saved across the routine

-rAX=1234 BX=1324 CX=001F DX=1111 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 E80200 CALL 0005 -g 3AX=0014 BX=0000 CX=001F DX=1111 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL ZR NA PE NC 13D5:0003 CD01 INT 01 Test on invalid data (BX = 0) -ripIP 0003:0

-rAX=0014 BX=0000 CX=001F DX=1111 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC 13D5:0000 E80200 CALL 0005 -g 3AX=0014 BX=0000 CX=001F DX=1111 SP=0000 BP=0000 SI=0000 DI=0000

128

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL ZR NA PE CY 13D5:0003 CD01 INT 01 -q

Certain features that we have introduced in this program are of general interest. One such feature is the check on the input data, done at the beginning in this program. Normally, programs operate on the data as long as the data is within some specific limits, and if the input data transgresses these boundaries, the program will get into problems. It is therefore, a good practice to have a check on the input data and limit that data within certain boundaries. Any step beyond the allowable boundary will have to be handled by not working on the data at all, but giving an indication in the output that the data is invalid. This indication can be done in several fashions. Normally a message output is presented on the screen or printed out indicating the fact. A simple way of handling this requirement is through one of the flags in the flag register. One usual choice is the carry flag as in this case. When the normal computation is through, all that needs to be done in the main program is to check the carry flag, if a jump on carry instruction is used, the program flow can be conveniently altered to account for this error. This method is very common when using procedures or interrupt routines. Another feature of this program is the fact that the push and pop of the register DX is done only when the data is valid, if the data is invalid no such push pop is done, as no calculation is done if the data is invalid.

Exercise: Extend the above program, to give the LCM of the input data numbers, using the well known relation: LCM (n1, n2) = (n1)*(n2)/GCD (n1, n2)Hint: There are enough registers available, so that the following steps can be followed:

1. Save n1 and n2 in, say, si and di registers2. Find GCD of n1 and n2 in ax as has been shown, and move it to bx.3. Take n1 in ax (from si), make dx = 0 and word divide n1 by the GCD4. The result will now be in ax with nothing in dx (why?), multiply this result by di.

The LCM will be in dx:ax and the GCD will be in bx.

This is perhaps the best method for finding LCM, even when GCD is not needed.

2. A program to produce a list of Fibonacci numbers not exceeding 16-bits: This is written here as a main program. Fibonacci numbers start with 0 and 1, and grow according to the equation Fn = Fn-1 + Fn-2, where Fn, Fn-1 and Fn-2 are three of the numbers in sequence (with F0 = 0 and F1 = 1). The requirement to handle the storing of a list or array of numbers is best met by using the string instruction, stosw and stosb, the former for word store and the latter for byte store, of a data string in memory. In both cases, the segment register used is the ES. However, while displaying the result in the debug environment, the default segment used is the DS. So, unless there is a specific reason to have DS and ES as separate numbers, it is always convenient to have identical segment addresses in DS and ES as done in this program.

The assembly language program data segment fibo dw 200 dup (0); unknown number of entries possible, so liberal provision.data endscode segment

129

assume cs:code, ds:data, es:data ; DS and ES are same as discussed above. start: mov ax,data mov ds, ax mov es, ax sub ax, ax ; the first number lea di, fibo stosw ; stored mov bx, ax ; first word goes to bx inc ax ; second word in ax back: stosw ; stored xchg ax, bx ; two consecutive words now in ax, bx add ax, bx ; add them to get the next word jnc back ; does it go beyond the 16-bit limit?, if not go back. int 01 code ends end startTesting in debug-u 0 1713EE:0000 B8D513 MOV AX,13D5 13EE:0003 8ED8 MOV DS,AX 13EE:0005 8EC0 MOV ES,AX 13EE:0007 2BC0 SUB AX,AX 13EE:0009 8D3E0000 LEA DI,[0000] 13EE:000D AB STOSW 13EE:000E 8BD8 MOV BX,AX 13EE:0010 40 INC AX 13EE:0011 AB STOSW 13EE:0012 93 XCHG BX,AX 13EE:0013 03C3 ADD AX,BX 13EE:0015 73FA JNB 0011 13EE:0017 CD01 INT 01 -g 7

AX=13D5 BX=0000 CX=01A9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13D5 ES=13D5 SS=13D5 CS=13EE IP=0007 NV UP EI PL NZ NA PO NC 13EE:0007 2BC0 SUB AX,AX -d 0 3f13D5:0000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13D5:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13D5:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13D5:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-g 17AX=2511 BX=B520 CX=01A9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0032 DS=13D5 ES=13D5 SS=13D5 CS=13EE IP=0017 NV UP EI PL NZ NA PE CY 13EE:0017 CD01 INT 01 -d 0 3f13D5:0000 00 00 01 00 01 00 02 00-03 00 05 00 08 00 0D 00 ................13D5:0010 15 00 22 00 37 00 59 00-90 00 E9 00 79 01 62 02 ..".7.Y.....y.b.13D5:0020 DB 03 3D 06 18 0A 55 10-6D 1A C2 2A 2F 45 F1 6F ..=...U.m..*/E.o13D5:0030 20 B5 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...............-q ; see comments in the next page to understand the significance of the ; highlight in the flag register display above.; an interesting alternative to this program without using the string ; instruction stosw is presented below (in its essentials, without segment ; definitions etc).

130

Xor ax, axMov bx, 1 ; the first two numbers 0 and 1lea si, list_start ; start of the listMov [si], ax

Up: Mov [si+2], bxAdd si, 4Add ax, bxJc down ; to terminate on carryMov [si], axAdd bx, ax ; note, bx is made the destination nowJnc up ; don’t terminate if no carry

Down: int 01 ; terminateThis program also gives the same result as above.

Comments on the first program above (using the stosw instruction): It is to be noted that we have missed an important point in the above program, and that is, we have forgotten to set the direction flag to have the array address incremented, that is, to have the direction ‘up’. Fortunately, the direction happened to be up as can be seen from the flag register display in the debug, so we were not able to notice this error. The rule however, is as follows:

Before using string instructions, always set the direction flag as required. The instruction STD (for decrementing the string address) or CLD (for

incrementing the string address) is to be there before the use of the string instructions. In our program, the first use of stosw is as the 6th instruction from start label. So the CLD instruction should appear inserted anywhere in the block of 5 instructions from start. In the above program it has not been done, but the result has come out OK because the D flag is turned off when we freshly enter the debug as we have done here.

3. A Program to find the sum of n numbers stored in an array in the memory: This requirement is best met by the use of string or array handling instructions for handling the addresses. Consider an array of words to be summed up. This program requires the array data to be read from memory. The instructions applicable are lodsw and lodsb. These instructions will load into the registers AX or AL and then update the address in the SI register. The default segment is the DS. We may not use the segment ES in this program. The .asm program is given below.

Program Sum.asm for adding an array of up to 64K of unsigned bytesdata segment array db 24h, 0a4h, 0bbh, 0fah, 58h, 23h asize dw $-array ; notice the way of defining the array sizedata ends code segmentassume cs: code, ds:data start: mov ax,data mov ds,ax

cld ; before we forget, we clear the D flagmov si, offset arraymov cx, asize; array size in number of bytessub dx,dxmov bx,dx ; bx and dx will be used for summing operationmov ax,dx ; essentially to make AH register = 0.

back: lodsb

131

add bx,ax ; collect the sum in BXadc dx,0 ; any overflow from bx beyond the word size, will go to

DXloop backint 01

code endsend start

The program above is simple enough to understand. But certain questions may arise. The data added are in terms of bytes. But why is word addition (add ax, bx) done in the program? Why the array size of 64K is chosen as maximum? What is the meaning of $-array, as used to define the array size (label assize) in the data segment? We will answer these questions one by one.

Add ax, bx: the data in al is to be added to the data in register bx, because, when some bytes are added, the sum would sooner or later, overflow the byte size, and would become word size. So, by putting 0’s in AH register, we would have made the byte into a word in AX register, which is added to the sum collected so far in BX.

Size limit of 64K for the array: The offset address that can be accommodated in SI is 216 or 64K (65536). Hence we cannot easily handle a byte array more than this size. Please note, if we are handling word arrays, the maximum size easily handled is 32K only. Also we have to take note that when we add more than 256 byte sized numbers, our result may exceed 16 bit value and our addition must then handle numbers up to 3 byte size and so on. If we add 64KB of byte size data we need to provide for full 24 bit addition.

Meaning of $ - array in the data segment: $ is the symbol for the current memory address and array represents the address at the label marked array, so automatically, this expression $-array computes the array size in bytes. If word array is to be handled, the size in bytes will have to be halved, which can easily be done in the program by a single right shift of the array byte count or you might specify as ($-array)/2. MASM will take care of this conversion during assembly of the program,

4. A program to find the approximate square root of a 16-bit number: Finding the square root is not an easy process. However, squaring a number is a relatively simple process, and it can be used to find the square root (at least approximately) as the following simple program shows.

Basic AL Program for 16-bit number root finding; In this program, the number whose root is to be found is put in the dx register ; and the exact root if it is a perfect square, or an approximate root otherwise is ; found in the register cl. The square of the number in the cl register is shown ; in the ax register, so an approximate idea of the square root of the number in dx ; can be had. The essentials of the program are given below without comments. Try ; and work out the logic of the program. OR DX, DX

JNZ SKIP MOV CL, DL MOV AX, DX

INT 01SKIP: MOV CX, 1UP: MOV AL, CL

MUL AL

132

CMP AX, DX` JAE DOWN

INC CLJNZ UPDEC CL

DOWN: INT 01

The program above tries the square of every number from 1 onwards, until the square of the number exceeds the given value in the DX register. The process can be speeded up by increasing initially in steps of 16 and then refining the process by increasing insteps of 1. Use of macros will help here. The following .asm program below will show how:

The Program Sqrt.asm code segmentassume cs: code approx macro n local ddd, dddd, uu uu: mov al, cl mul al cmp ax, dx ja ddd ; The macro highlighted in yellow jz dddd add cl, n jz ddd jmp uu ddd: sub cl, n dddd: endm start: or dh, dh mov cl,0 jz down approx 10h down: approx 1 jz dn1 ; The Program using the macro in green mov al, cl mul al dn1: int 01 code ends end start

It should be noted that this program, when assembled, will require more memory space, but will certainly execute faster, than the earlier program of the previous page.

5. Bubble Sort with Flagged Exchange: We will now look into a standard bubble sort operation on an array.

The Bubble sort (ascend sort, as unsigned numbers) ALP using a macro Bigb, which bubbles the biggest element of the array down to the bottom of the arraydata segmentaray dw 75c2h, 8d29h,3bfbh,3bfbh,72f0hdata endscode segmentassume cs: code, ds:datastart : mov ax,data mov ds,ax Mov cx, 5 ; array count n

133

lea si, aray ; array start address sub dx,dx ; exchange flag mov di, si ; save array start address dec cx ; only n-1 bubbling necessary up: mov bp, cx ; save this count for the next round bigb macro ; big-to-the-bottom macro local back, down mov ax,[si] back: mov bx, 2[si] cmp ax, bx ; is ax > bx? jbe down ; if no, go down xchg ax, bx ; else, exchange ax, bx inc dx ; indicate the array is altered, making dx non-zero down: mov [si], ax ; store appropriate value at {si} mov ax, bx ; adjust registers for the next bubble add si,2 ; point to the next address and loop back ; do the next bubble. cx = 0 here last mov [si], ax ; store the last data, which is biggest endm bigb ; call to the macro cmp dx, cx ; is dx = 0? (any alteration in the array?) jz over ; if no, job over mov dx, cx ; arrange to repeat; dx flag = 0 mov cx, bp ; bubble count is 1 less mov si, di ; start address of array is same loop up ; repeat bubbling now over: int 01 ; terminate the process code ends end startTesting the program in the debug-u 0 3513D6:0000 B8D513 MOV AX,13D5 13D6:0003 8ED8 MOV DS,AX 13D6:0005 B90500 MOV CX,0005 13D6:0008 8D360000 LEA SI,[0000] 13D6:000C 2BD2 SUB DX,DX 13D6:000E 8BFE MOV DI,SI 13D6:0010 49 DEC CX 13D6:0011 8BE9 MOV BP,CX 13D6:0013 8B04 MOV AX,[SI] 13D6:0015 8B5C02 MOV BX,[SI+02] 13D6:0018 3BC3 CMP AX,BX 13D6:001A 7602 JBE 001E 13D6:001C 93 XCHG BX,AX 13D6:001D 42 INC DX ; macro expanded here 13D6:001E 8904 MOV [SI],AX 13D6:0020 8BC3 MOV AX,BX 13D6:0022 83C602 ADD SI,+02 13D6:0025 E2EE LOOP 0015 13D6:0027 8904 MOV [SI],AX 13D6:0029 3BD1 CMP DX,CX 13D6:002B 7408 JZ 0035 13D6:002D 8BD1 MOV DX,CX 13D6:002F 8BCD MOV CX,BP 13D6:0031 8BF7 MOV SI,DI 13D6:0033 E2DC LOOP 0011 13D6:0035 CD01 INT 01 -g 5

134

AX=13D5 BX=0000 CX=0047 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D6 IP=0005 NV UP EI PL NZ NA PO NC 13D6:0005 B90500 MOV CX,0005 -d 0 f13D5:0000 C2 75 29 8D FB 3B FB 3B-F0 72 00 00 00 00 00 00 .u)..;.;.r......; initial unsorted array – watch the five unsorted words: 75C2, 8D29, 3BFB, 3BFB ; and 72F0-gAX=72F0 BX=72F0 CX=0000 DX=0000 SP=0000 BP=0002 SI=0004 DI=0000 DS=13D5 ES=13C5 SS=13D5 CS=13D6 IP=0037 NV UP EI PL ZR NA PE NC 13D6:0037 C6061D01FF MOV BYTE PTR [011D],FF DS:011D=E8-d 0 f13D5:0000 FB 3B FB 3B F0 72 C2 75-29 8D 00 00 00 00 00 00 .;.;.r.u).......; final sorted array, see the rearranged, sorted array of the same words.-q

The bubble sort is handling an array both for reading from, as well as writing to the memory. Therefore the string instruction, lods or the instruction stos or both could be used. Using both may be difficult, as it requires two registers for the inner loop and these will have to be renewed from the memory every time the outer loop is initialized. Two example programs are given below, using only the lods instruction. It can be seen that there is some difficulty in handling the registers where the data comparison is made in respect of selecting the item to be stored in the memory and keeping track of the array modification in the inner loop using the DX register. The two programs given, present two ways of keeping proper track. It may be simpler if stos only is used. The reader may try this alternative as an exercise. The program area managing the difficult part of handling the tracking of array modification using the DX register is highlighted in both the programs for identification and study by the reader.

Bubble Sort Program Without using macro, Version 1data segment ; this program is tested and works OK array dw 1234h, 0abcdh,1234h, 0fdcah, 2345h count dw 5data endscode segmentassume cs: code, ds: data start: mov ax, data mov ds, ax mov cx, count lea di, array dec cx cld sub dx, dx up: mov si,di mov bp, cx lodsw mov bx, axback: lodsw inc dx cmp ax,bx jb down xchg ax,bx

135

dec dx down: mov [si-4], ax loop back mov [si-2], bx cmp dx, cx mov dx,cx mov cx, bp loopnz up int 01 code ends end startBubble Sort Program without macro, Version 2data segment ; this program is tested and works OK array dw 1234h, 0abcdh,1234h, 0fdcah, 2345h count dw 5data endscode segmentassume cs: code, ds: data start: mov ax, data mov ds, ax mov cx, count lea di, array dec cx cld sub dx, dx up: mov si,di mov bp, cx lodsw back: mov bx, ax lodsw cmp ax,bx jae down xchg ax,bx inc dx down: mov [si-4], bx loop back mov [si-2], ax cmp dx, cx mov dx,cx mov cx, bp loopnz up int 01 code ends end startBubble sort program version 3 using an inner loop within the outer instead of a macrocode segment paraassume cs: code, ds:code, es:code strt: mov ax, code mov ds, ax mov es, ax cld mov si, offset arrstrt mov cx, count dec cx ; n data items give n-1 pairs for comparison back1: mov bp, cx ; save count in bp for the outer loop mov di, si ; si remains constant, di is used in the loop

136

sub dx, dx ; ‘exchange’ track flag, initialized to zero mov ax, [di] back: mov bx, [di+2] ; inner loop starts cmp ax, bx jle down ; smaller of the two goes to memory at[di]

; numbers interpreted as signed integers inc dx ; alter dx from zero xchg ax, bx down: stosw mov ax, bx ; the other data used in ax for next compare loop back ; inner loop over mov [di], ax ; the last item is also stored in memory mov cx, bp ; recall the outer loop count or dx, dx ; any exchange in the inner loop? loopnz back1 ; if no, or if loop count is zero then

int 1 ; terminate jmp strt ; given for facilitating testing again nop ; to misalign the start of the data part. Align 2 ; assembler directed for word alignmentarrstrt dw 1234h, 2348h, 8086h, 0abcdh, 0ffabh, 23ach count dw ($ - arrstrt)/2 code ends end strtTESTING IN DEBUG-u 0 3413C0:0000 B8C013 MOV AX,13C0 13C0:0003 8ED8 MOV DS,AX 13C0:0005 8EC0 MOV ES,AX 13C0:0007 FC CLD 13C0:0008 BE3400 MOV SI,0034 13C0:000B 8B0E4000 MOV CX,[0040] 13C0:000F 49 DEC CX 13C0:0010 8BE9 MOV BP,CX 13C0:0012 8BFE MOV DI,SI 13C0:0014 2BD2 SUB DX,DX 13C0:0016 8B05 MOV AX,[DI] 13C0:0018 8B5D02 MOV BX,[DI+02] 13C0:001B 3BC3 CMP AX,BX 13C0:001D 7E02 JLE 0021 13C0:001F 42 INC DX 13C0:0020 93 XCHG BX,AX 13C0:0021 AB STOSW 13C0:0022 8BC3 MOV AX,BX 13C0:0024 E2F2 LOOP 0018 13C0:0026 8905 MOV [DI],AX 13C0:0028 8BCD MOV CX,BP 13C0:002A 0BD2 OR DX,DX 13C0:002C E0E2 LOOPNZ 0010 13C0:002E CD01 INT 01 13C0:0030 EBCE JMP 0000 13C0:0032 90 NOP 13C0:0033 90 NOP ; this by the assembler for aligning. 13C0:0034 3412 XOR AL,12 -d cs:34 4113C0:0030 34 12 48 23-86 80 CD AB AB FF AC 23 4.H#.......#13C0:0040 06 00

137

..-g

AX=1234 BX=1234 CX=0002 DX=0000 SP=0000 BP=0003 SI=0034 DI=003A DS=13C0 ES=13C0 SS=13C0 CS=13C0 IP=0030 NV UP EI PL ZR NA PE NC 13C0:0030 EBCE JMP 0000 -d 34 3f13C0:0030 86 80 CD AB-AB FF 34 12 48 23 AC 23 ......4.H#.#-qNote: The termination has been because of no exchange. The cx = 5,4 loops have worked and in cx = 3 loop, there have been no exchange and the program has terminated because of this after decrementing cx to 2 by the loopnz instruction The result for loop with cx = 5 is 1234, 8086, abcd, ffab, 2348, 23ac With cx = 4 is 8086, abcd, ffab, 1234, 2348, 23ac With cx = 3 is 8086, abcd, ffab, 1234, 2348, 23ac resulting in no exchange taking place for cx = 3. Hence the loop terminates with cx becoming 2 as seen.

6. Program to Copy an Array of Data: Copying a data array may have to be done in several situations. Even file copy can be done by data array copy program if we input the file as a collection of data words. This program is also of interest because it can use ‘movsw’ which is the only instance of direct memory to memory data operation done by the 8086 processor. Recall at this point, that the 8086 processor is designed as a register/memory or register/register processor, and not as a memory/memory processor. So this is an important exception to that design feature. It uses the string move instruction movsb or movsw with no operands, that is, with all operands implied. The instruction movsb will do the following operation:

Copy the byte at DS:SI to the location ES:DI. The source data is in the data segment at the address in the Source Index register SI. The destination of the move operation is the extra segment at the offset address in the Destination Index register DI. The provision of different segment registers for the source and destination of the move operation permits the full range of memory to be used. The source data as well as the destination address can be anywhere in the total memory of 1MB. Use of the same segment register for both source and destination would restrict the move to be confined to a single segment or 64 KB. With separate segment registers, different and remote memory locations may be approached for the copy operation. The string move instructions also do the address change operations also, after the move is done. If the direction flag D is clear, the addresses in SI and DI will be incremented (by 1 for byte move, and by 2 for word move), while, if the direction flag D is set, both SI and DI will be decremented appropriately.

Why this choice of address increment/ decrement? In case, there is overlap

between source and destination data block, there could be problems as illustrated below. Suppose the source block is 100h to 1FFh, and destination block is 180h onwards in the same segment, with an overlap of the blocks in

138

the range 180h to 1FFh . If we now start with the start of the source block at 100h and do movsb operation, we would be copying the byte at 100h into 180h. But remember, there is another source data sitting there at memory180h, which gets lost by this operation. Continuing will make us lose the data in the block from 180h to 1FFh being over-written with the data from 100 17Fh. Consider, on the contrary, we start the transfer of data from the other end, namely from 1FFh of source and take it to 27Fh of destination, and keep decrementing the addresses to continue the copying, all our data will be safely transferred without loss. This requires operation with the D flag set. With overlap, if the source start address is larger than the destination start address, it can be checked easily, that the data will be safely copied when addresses increase every time, that is, with D flag cleared. In case there is no overlap between the source and destination data blocks, working with either D flag set or cleared will be OK. Combining all these, we can see that if the absolute physical address of the source array start is lower than the destination start address the data transfer should be done starting from the array end with the D flag set irrespective of whether there is overlap or not. If the source start address is greater than the destination start, the data transfer can be done beginning from the start of the array with the D flag cleared irrespective of overlap. In the trivial case of the start address of source and destination are the same, no move is needed. The Block Move Program 1 below, indicates the operations when the source address is lower than the destination address and with ES = DS. Program 2 considers the case where DS and ES happen to be different.

Repeat Prefix: The rep prefix can be used in this context instead writing a transfer loop. After initializing the SI, DI and the DS, ES registers, initialize the CX register with the byte count for movsb or word count for movsw operation and then use the respective instruction using the rep prefix. The loop will be executed reducing CX every time until it becomes zero.

Exercise: The data block from 13D5:1322h to 13D5:1351h is to be moved to 13D7:1320h onwards. What should be the D flag setting? Give an assembly language program in the debug environment to do the job.

Below is the program 1 to copy a data array starting at the memory from location labeled blok, to start at the memory location labeled dest in the same segment. In this example destination address is above the source address and the source and destination data blocks overlap. It is therefore necessary to move the data from bottom end upwards setting the D flag. The program is given below.

Example :Block Move program 1:

data segment blok dw 1234h, 5678h, 9abch, 0cdefh, 2345h, 789ah count dw($-blok)/2 dw 10 dup (0)

139

dest dw 3data endscode segment assume cs:code, ds:data, es: data strt: mov ax, data mov ds, ax mov es, ax mov si, offset blok mov di, dest mov cx, count mov ax, cx dec ax shl ax, 1 add si, ax add di, ax std rep movsw int 01 code ends end strt-u 0 1e13D8:0000 B8D513 MOV AX,13D5 13D8:0003 8ED8 MOV DS,AX 13D8:0005 8EC0 MOV ES,AX 13D8:0007 BE0000 MOV SI,0000 13D8:000A 8B3E2200 MOV DI,[0022] 13D8:000E 8B0E0C00 MOV CX,[000C] 13D8:0012 8BC1 MOV AX,CX 13D8:0014 48 DEC AX 13D8:0015 D1E0 SHL AX,1 13D8:0017 03F0 ADD SI,AX 13D8:0019 03F8 ADD DI,AX 13D8:001B FD STD 13D8:001C F3 REPZ 13D8:001D A5 MOVSW 13D8:001E CD01 INT 01 -g 1bAX=000A BX=0000 CX=0006 DX=0000 SP=0000 BP=0000 SI=000A DI=000D DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=001B NV UP EI PL NZ NA PO NC 13D8:001B FD STD -d 0 f13D5:0000 34 12 78 56 BC 9A EF CD-45 23 9A 78 06 00 00 00 4.xV....E#.x....-tAX=000A BX=0000 CX=0006 DX=0000 SP=0000 BP=0000 SI=000A DI=000D DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=001C NV DN EI PL NZ NA PO NC 13D8:001C F3 REPZ 13D8:001D A5 MOVSW -gAX=000A BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=FFFE DI=0001 DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=0020 NV DN EI PL NZ NA PO NC 13D8:0020 0000 ADD [BX+SI],AL DS:FFFE=DB-d 0 f13D5:0000 34 12 78 34 12 78 56 BC-9A EF CD 45 23 9A 78 00 4.x4.xV....E#.x.-q

140

Certain features of the above program may need explanation. Firstly, the entry for the count in the data segment (highlighted) defines the count as a word size data and gives a simple expression: the number of data bytes = $-blok, this divided by 2 is the number of words of the blok. The assembler will compute this value during assembly. Secondly, it could easily be worked out in this case, that the data transfer is to be done starting from the tail end of the data block. Using the word count in the data block it is necessary to get the tail addresses of the source and the destination blocks. The highlighted part of the code represents this calculation and the setting of the D-flag. If data could be transferred starting from the head end, all this is not needed, and a simple CLD will suffice to ensure address incrementing.

Example: Block Move Program 2:In this example, deliberately ES and DS segments are made different and the

actual physical memory addresses of the memory source and destination of the block move may or may not overlap. The program computes the physical address difference between the source and destination first and then takes a decision about the setting of the direction flag. This gives an example of a very general block move operation. In the program below, the source array is moved to the destination location. The program is assembled and tested as shown. The segments and data are so arranged as to start moving from the end address with the D flag set. The result of executing the program in the debug is shown.

; The Block2.asm program data segment ; arr db 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 db 16, 17, 18, 19, 20 ; source array aln db ? n dw ($-arr-1), 51 dup (0) data ends; extra segment ddi dw ? dw 32 dup(0) extra ends ; code segment assume cs:code, ds:data, es:extra strt: mov ax, data mov ds, ax mov ax, extra mov es, ax cld ; the macro below compares the ds:si address with es:di address cmpadr macro ;; this macro compares the physical address of ;; source and destination addresses; can be used ;; generally in such block move situations. local cmp1 push dx push cx sub cx,cx mov ax, ds mov dx, es sub ax, dx mov dx, 16 imul dx

141

mov di, offset [ddi] sub ax, di sbb dx, cx mov si, offset [arr] add ax, si adc dx, cx or ax, dx jz cmp1 inc cl rol dx,1 jnc cmp1 neg cx cmp1: mov ax, cx pop cx pop dx endm ; cmpadr mov si, offset [arr] mov di, offset [ddi] or ax, ax jz over mov cx, n jns down mov ax, cx dec ax add si, ax add di, ax std down: rep movsb cld over: int 1 code ends end strt; The program occupied 54 hex bytes of code.; Execution of the above program gave the following results -g AX=0014 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=FFFF DI=FFFF DS=13DC ES=13E4 SS=13DC CS=13E9 IP=0055 NV UP EI PL NZ NA PE NC 13E9:0055 47 INC DI -d 0 bf13DC:0000 00 01 02 03 04 05 06 07-08 09 0A 0B 0C 0D 0E 0F ................13DC:0010 10 11 12 13 14 00 15 00-00 00 00 00 00 00 00 00 ................13DC:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0040 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; Note: Extra segment starts from here (13DC:0080 is same as 13E4:0000)13DC:0080 00 01 02 03 04 05 06 07-08 09 0A 0B 0C 0D 0E 0F ................13DC:0090 10 11 12 13 14 00 00 00-00 00 00 00 00 00 00 00 ................13DC:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; original source data : copied data

The macro cmpadr given above assumes ES and DS segments are different and source array starts at offset arr in the data segment, while the destination array is in the

142

extra segment and is to start at offset ddi. Further, the macro saves all registers other than AX. If the absolute address of the destination array is less than that of the source array, data can be moved from start to end of the array irrespective of the overlap to get the correct result. For this condition, the value in AX at the end of the macro is 1. If it is greater than the source start address, then AX will have -1. In the very rare case they happen to be same, in which case no move need be done at all, at the end of the macro, the zero flag would have been set. However, if the data and extra segments are the same, the cmpadr macro need not be used and the program of Block move 1 given earlier will be adequate.

A special example of block handling – Block reversal in situ: A requirement may sometimes arise to have an array reversed and put back in the same location without using additional memory. The macro below will be able to do the job. While invoking the macro, the register SI must be loaded with the start address offset for the array in the data segment and the number of entries in the array must be indicated by n. The parameter bw in the invocation, refers to the type of the array, byte or word. If byte array bw should be 1, and if word array, bw should be 2. For example, invocation of the macro as: ‘rev 2, 50’ will reverse an array of 50 words. Rev1 loop makes the offset address in DI point to the end of the array, while rev2 loop carries out interchange of the start element with the end element and moving similarly inwards into the array towards the middle. Note how this loop is terminated at the middle of the array. Ensure yourself that the rev2 loop will handle correctly the reversing operation, when the value of n is even as well as when it is odd. The macro uses DI, SI, AX and CX registers

rev macro bw, n local rev1, rev2 mov di, si mov ax, n dec ax mov cx, bwrev1: add di, ax loop rev1 cldrev2: lodsw xchg ax, [di] mov [si-bw], ax sub di, bw cmp si, di jb rev2 endm

7. Checking if a given 16-bit number is a Prime: This program is slightly less simple compared to the other programs we have been studying. It will provide a good example for the use of macros in a program. The logic of the program is as follows: Step 1. Check the number is a valid number. We consider the numbers 0 and 1 as invalid inputs for this program. All other numbers are valid. Step 2. Check if the number is even, that is, divisible by 2. This check can be easily done using the rotate instruction. Rotate right, followed by rotate left will put the least significant bit of the number in the carry register, and still keep the original number unaltered. This is the simplest way to check the odd/even feature of any number, still saving the number without a change. Step 3. Try if the number is divisible by

143

odd numbers one by one; by successively dividing the number by the odd numbers and testing if the remainder is zero. Stop whenever the remainder becomes zero and declare the number not prime. The process could be improved if after seeing the number is not divisible by 3, we skip division by odd numbers which are multiples of 3. The sequence of numbers used as trial divisors is thus, 3, 5, 7, 11, 13, 17 etc. Step4. Terminate the process and declare the number is a prime, when the square of the trial divisor exceeds the given number. The logic of this step is that if a given number has no factor less than the square root of that number, then it cannot have a factor bigger than this square root, because, a factor bigger than the square root must imply that the quotient of division must be definitely less than the square root. This is also a factor of the given number. If we have not found such a number evenly dividing the given number, there cannot be a factor greater than the square root. In the program given below, the process of checking if the next trial divisor is greater than the square root, dividing to see if it is a factor of the given number is bundled into a macro. The program follows.

The Assembly language program; This program, considers an input no. in the register ax. The output from the ; program is in register ax. If ax = 0 at output, then the number is a prime, else ; if ax = -1 (FFFF H), then the number is not a prime. In this case, the smallest ; prime factor of the number is in cx register. Else, if ax has the number ABCD H, ; then the input number is invalid (0 or 1). In this case, the CY flag also will ; be set. In all other cases, the CY flag would be reset. The input will be found ; in bx at the output stage.code segmentassume cs:code strt: mov bx, ax; the number input in ax, is saved in bx mov cx, 2 cmp cx, bx ja invalid jz prime ror bx, 1 rol bx, 1 ; lsb is now in CY also, and bx is unaltered jnc nprime checkp macro n add cl,n ; get the next trial factor jc prime ; if cl exceeds 8 bits, the number is prime mov ax, cx mul cx ; get the square of the number; this will also make dx = 0 cmp ax, bx jz nprime ; if the number equals the square, then it is not prime ja prime ; if square is greater, then it is prime mov ax, bx div cx or dx,dx ; if no remainder, then jz nprime ; the number is not a prime endm checkp 1 ; testing trial factor 3, adding 1 to cx register checkp 2 ; testing trial factor 5, add 2 to cx now back: checkp 2 ; testing trial factors 7, 13, 19, etc checkp 4 ; testing trial factors 11, 17, 23, etc jmp back ; loop to increase the trial factor by 6 invalid: mov ax, 0abcdh stc jmp finish prime: mov ax, 00

144

jmp next nprime: mov ax, -1 next: clc finish: int 01 jmp strt ; used for repeated testing in the debug code ends end strt Testing in debug-u 0 8213D5:0000 8BD8 MOV BX,AX 13D5:0002 B90200 MOV CX,0002 13D5:0005 3BCB CMP CX,BX 13D5:0007 7766 JA 006F 13D5:0009 746B JZ 0076 13D5:000B D1CB ROR BX,1 13D5:000D D1C3 ROL BX,1 13D5:000F 736B JNB 007C 13D5:0011 80C101 ADD CL,01 13D5:0014 7260 JB 0076 13D5:0016 8BC1 MOV AX,CX 13D5:0018 F7E1 MUL CX 13D5:001A 3BC3 CMP AX,BX 13D5:001C 745E JZ 007C ; macro checkp 1, expanded 13D5:001E 7756 JA 0076 13D5:0020 8BC3 MOV AX,BX 13D5:0022 F7F1 DIV CX 13D5:0024 0BD2 OR DX,DX 13D5:0026 7454 JZ 007C 13D5:0028 80C102 ADD CL,02 13D5:002B 7249 JB 0076 13D5:002D 8BC1 MOV AX,CX 13D5:002F F7E1 MUL CX 13D5:0031 3BC3 CMP AX,BX 13D5:0033 7447 JZ 007C ; macro checkp 2, expanded 13d5:0035 773F JA 0076 13D5:0037 8BC3 MOV AX,BX 13D5:0039 F7F1 DIV CX 13D5:003B 0BD2 OR DX,DX 13D5:003D 743D JZ 007C 13D5:003F 80C102 ADD CL,02 13D5:0042 7232 JB 0076 13D5:0044 8BC1 MOV AX,CX 13D5:0046 F7E1 MUL CX 13D5:0048 3BC3 CMP AX,BX 13D5:004A 7430 JZ 007C ; macro checkp 2, expanded 13D5:004C 7728 JA 0076 13D5:004E 8BC3 MOV AX,BX 13D5:0050 F7F1 DIV CX 13D5:0052 0BD2 OR DX,DX 13D5:0054 7426 JZ 007C 13D5:0056 80C104 ADD CL,04 13D5:0059 721B JB 0076 13D5:005B 8BC1 MOV AX,CX 13D5:005D F7E1 MUL CX 13D5:005F 3BC3 CMP AX,BX 13D5:0061 7419 JZ 007C ; macro checkp 4, expanded 13D5:0063 7711 JA 0076 13D5:0065 8BC3 MOV AX,BX 13D5:0067 F7F1 DIV CX

145

13D5:0069 0BD2 OR DX,DX 13D5:006B 740F JZ 007C 13D5:006D EBD0 JMP 003F 13D5:006F B8CDAB MOV AX,ABCD 13D5:0072 F9 STC 13D5:0073 EB0B JMP 0080 13D5:0075 90 NOP 13D5:0076 B80000 MOV AX,0000 13D5:0079 EB04 JMP 007F 13D5:007B 90 NOP 13D5:007C B8FFFF MOV AX,FFFF 13D5:007F F8 CLC 13D5:0080 CD01 INT 01 13D5:0082 E97BFF JMP 0000 Executing the program in the debug-raxAX 0000:ffdf-g

AX=FFFF BX=FFDF CX=001F DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0082 NV UP EI PL ZR NA PE NC 13D5:0082 E97BFF JMP 0000; 1F is the smallest prime factor of FFDF -raxAX FFFF:ffef-gAX=0000 BX=FFEF CX=0001 DX=00F5 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0082 NV UP EI PL NZ AC PO NC 13D5:0082 E97BFF JMP 0000; FFEF is a Prime -g ; note 0 is in ax register for this run AX=ABCD BX=0000 CX=0002 DX=00F5 SP=0000 BP=0000 SI=0000 DI=0000 DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0082 NV UP EI PL NZ NA PO CY 13D5:0082 E97BFF JMP 0000; The input is invalid in this case

Note: In the above program, there are a lot of unnecessary operations involved by way of testing division by all numbers which are not multiples of 2 or 3. The checks by these extra numbers could be easily avoided if a table of prime numbers less than 256 is provided. Such a list is given below, in the data segment of the program listing and it is seen that there are only 53 such numbers (excluding 2); it is thus adequate if these numbers only are tested for division of the given number. In the program given earlier, we check 2 numbers in a group of 6, which means we check about 84 numbers in all upto 256, if we cover the full range. The program here requires only a very slight modification from the program given earlier, and is presented below.

Here is the program:data segment; here is the table of primes ending with -1.prlst db 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61 db 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131 db 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193

146

db 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 255 data endscode segmentassume cs:code, ds:data strt: mov bx, ax; the number input in ax, is saved in bx mov ax, data mov ds, ax mov cx, 2 cmp cx, bx ja invalid jz prime ror bx, 1 rol bx, 1 ; lsb is now in CY also, and bx is unaltered jnc nprime cld sub ch, ch lea si, prlst checkp macro ; this macro checks if the next prime ; is a factor of the given number sub dx, dx ; prepare for word division lodsb ; get the next prime from the table cmp al, -1 ; check for end of table jz prime ; if yes, the number is prime mov cl, al mul cl cmp ax, bx jz nprime ja prime mov ax, bx div cx ; word division; reason out why? or dx, dx jz nprime endm ; back: checkp jmp back ;in this loop we check if the next prime is a factor. invalid: mov ax, 0abcdh stc jmp finish prime: mov ax, 00 jmp next nprime: mov ax, -1 next: clc finish: int 01 code ends

The reader should be able to check the validity of the program

8. A program for counting the leading 0’s in a 32 bit data in regs. dx:ax While handling the normalization process in floating point numbers, it becomes necessary to count the leading 0’s in double length numbers. The program is given below without any comment or demonstration of its working. Interested readers may take it as an exercise and write appropriate comments and also study its working by assembling it and testing it in DEBUG. The program returns the input data intact in dx:ax, and the count of leading zeros in cx.

THE PROGRAM FOR COUNTING THE LEADING ZEROS OF DATA IN DX:AX REGISTERS

147

code segment assume cs:code strt: Push dx push ax sub cx, cx or dx, dx jnz d16 mov cx, 16 mov dx, ax d16: or dh, dh jnz d8 add cx, 8 mov dh, dl d8: test dh, 0f0h jnz d4 add cx, 4 shl dh, 1 shl dh, 1 shl dh, 1 shl dh, 1 d4: test dh, 0C0h jnz d2 add cx, 2 shl dh, 1 shl dh, 1 d2: test dh, 080h jnz d1 add cx, 1 d1: or dh, dh jnz d0 add cx, 1 d0: pop ax pop dx int 01 jmp strt ; to repeat with a new set of data code ends end strt

9. A slightly different program for counting leading zeros in a 64 bit data stored in registers AX, BX, DX, BP with AX being the highest significant word: Here is a different approach for the whole process. The intention in this program is to make the program relatively small and not involving any serious logical complexity. However, it does take more time compared to the earlier program studied. The program and partial testing of it are given below without much comment. The program shifts the data so that the msb becomes the leading bit of reg. AX, the leading zeros brought round as trailing zeros.

; This program counts the leading 0's in the quad word stored in regs.; ax, bx, dx, bp with high word in ax. The program also shifts the

data; so that the first non zero bit comes as msb in ax with the entire

data; shited in all the 4 registers inserting as many trailing zeros as

required.; the count of the leading zeros is available in cx.;code segmentassume cs: code

148

Strt: mov cx, 41h ; 1 extra count - the loop starts with loopz instn.

jmp down back: add bp, bp adc dx, dx adc bx, bx adc ax, ax down: test ah, 80h loopz back sub cx, 40h neg cx int 1 jmp strt code ends end strt

-u 0 1813DC:0000 B94100 MOV CX,0041 13DC:0003 EB09 JMP 000E 13DC:0005 90 NOP 13DC:0006 03ED ADD BP,BP 13DC:0008 13D2 ADC DX,DX 13DC:000A 13DB ADC BX,BX 13DC:000C 13C0 ADC AX,AX 13DC:000E F6C480 TEST AH,80 13DC:0011 E1F3 LOOPZ 0006 13DC:0013 83E940 SUB CX,+40 13DC:0016 F7D9 NEG CX 13DC:0018 CD01 INT 01 -r bpBP 0000 ;initially ax, bx, dx, bp are all 0’s. bp is made = 0011h. :11-gAX=8800 BX=0000 CX=003B DX=0000 SP=0000 BP=0000 SI=0000

DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=001A NV UP EI PL NZ AC PO CY 13DC:001A EBE4 JMP 0000 -raxAX 8800:0-gAX=0000 BX=0000 CX=0040 DX=0000 SP=0000 BP=0000 SI=0000

DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=001A NV UP EI PL NZ NA PO CY 13DC:001A EBE4 JMP 0000

10. A program for taking in 4 hex digits from the keyboard using the DOS interrupt 21h, function 1: In the following program, we take a 4-digit hex input from the keyboard. To keep the program to be applicable in a variety of situations, it is proposed to ignore any non-hex key if pressed accidentally, and also if a wrong entry is made as seen by the echo on the monitor, all the 4 hex digits can be properly fed in without having

149

to complete and restart the entry. The program takes in only the last 4 hex digits entered. The entry is terminated by the character ‘$’. The program and the assembled version are presented below. A macro is used and it is highlighted in the listing.

assume cs:code 0000 code segment 0000 B9 0004 strt: mov cx,4 ; shift count 0003 BB 3030 mov bx,3030h 0006 8B D3 mov dx, bx 0008 B4 01 here: mov ah,1 ; 000A CD 21 int 21h ; input from keyboard with echo 000C 3C 24 cmp al, '$' 000E 74 1C jz next ; end of input 0010 3C 30 cmp al, 30h 0012 72 F4 jb here 0014 3C 3A cmp al,3ah 0016 72 0A jb next1 0018 24 DF and al, 0dfh; convert lower-to-upper case 001A 3C 41 cmp al, 41h 001C 72 EA jb here 001E 3C 46 cmp al, 46h 0020 77 E6 ja here 0022 8A F2 next1: mov dh, dl ; rotate regs to make way for

; fresh ascii input 0024 8A D7 mov dl, bh 0026 8A FB mov bh, bl 0028 8A D8 mov bl, al 002A EB DC jmp here ; get fresh ascii input

a2h macro r1 ;; ascii-to-hex conversion local down0 cmp r1, 40h jb down0 sub r1,7 down0:sub r1, 30h endm

002C next: a2h bl 002C 80 FB 40 1 cmp bl, 40h 002F 72 03 1 jb ??0000 0031 80 EB 07 1 sub bl,7 0034 80 EB 30 1 ??0000:sub bl, 30h

a2h bh 0037 80 FF 40 1 cmp bh, 40h 003A 72 03 1 jb ??0001 003C 80 EF 07 1 sub bh,7 003F 80 EF 30 1 ??0001:sub bh, 30h 0042 D2 C7 rol bh, cl 0044 0A DF or bl, bh

a2h dl 0046 80 FA 40 1 cmp dl, 40h 0049 72 03 1 jb ??0002 004B 80 EA 07 1 sub dl,7 004E 80 EA 30 1 ??0002:sub dl, 30h 0051 8A FA mov bh, dl

a2h dh 0053 80 FE 40 1 cmp dh, 40h 0056 72 03 1 jb ??0003 0058 80 EE 07 1 sub dh,7 005B 80 EE 30 1 ??0003:sub dh, 30h 005E D2 C6 rol dh, cl 0060 0A FE or bh, dh 0062 CD 01 int 1

150

0064 EB 9A jmp strt 0066 code ends

end strt

Result of executing the program-g12344Bxcdyo$; the input listAX=0124 BX=4BCD CX=0004 DX=A00B SP=0000 BP=0000 SI=0000 DI=0000 DS=13B1 ES=13B1 SS=13C1 CS=13C1 IP=0064 NV UP EI NG NZ NA PO NC Notice the non-hex inputs are ignored and the last 4 hex keys are presented as a hex number of 4-digits in the register BX. The ASCII code (24h) for the character ‘$’is seen in the register AL, which has caused the program to terminate.

11. Interrupt 21h function 7: Many times we may need to wait in the middle of a program, may be till we finish reading the material already displayed, after which we may want to have further display. Function 07 of DOS interrupt 21h will be helpful here. This function causes a program to wait until a key is pressed and only then allows the program to proceed. The ASCII code of the key pressed is available in AL register like in function 1 of interrupt 21h, but function 7 does not echo the character pressed to the standard output of the system (that is, the monitor). A simple program to demonstrate this function is given below:

; A simple program - waits for a key press and; returns ‘OK’ on the monitor when any key is pressed. ; File name: ok.asm

assume cs:code 0000 code segment 0000 B4 07 start: mov ah, 7 0002 CD 21 int 21h ; wait for key press 0004 B4 02 mov ah, 2 0006 B2 4F mov dl, 'O' ; note: single or double 0008 CD 21 int 21h ; quotes are okay for speci- 000A B2 4B mov dl, "K" ; fying display 'O’ “K” 000C CD 21 int 21h 000E B4 4C mov ah, 4ch ; function call to terminate 0010 CD 21 int 21h ; and return to the system 0012 code ends

end start; This program could be directly tested in the DOS environment as follows:; First the ok.asm file is assembled and then linked to produce an executable; file ok.exe; then it is executed in the Dos environment by the simple command; ok, the program is seen to wait till a key is pressed and when it happens,: the word ‘OK’ is displayed on the monitor. Below is a demoC:\DOCUME~1\acer\MYDOCU~1\MYFILE~1\REF~1.MAT\DOSPRO~1> ok ; no result for this ; command until any ; key is pressed. ; Then we get OK ; ‘OK’ as seen (left), ; and the programC:\DOCUME~1\acer\MYDOCU~1\MYFILE~1\REF~1.MAT\DOSPRO~1> ; returns control to ; DOS (terminates).

151

There are several other useful interrupt 21H functions, many of which are useful for controlling different input/ output devices. Information on these functions is readily available in the internet. They make handling of I/O operations like disk reading/writing, video display handling etc. It is not the purpose here to go into these ready made programs and their use.

_____xxxx_____

EXERCISES

1. Find the logic of the following 4-digit BCD to hex converter program. Input is a 4-digit BCD in reg AX, and output in reg DX. Hint: this is a divide by 2 operation to get bits of the result.

code segment assume cs: code strt: mov cx, 16 sub dx, dx next: mov bx, ax and bx, 1110h shr bx, 1 shr bx, 1 sub ax, bx shr bx, 1 sub ax, bx shr ax, 1 rcr dx, 1 loop next int 1 code ends end strtTESTING IN DEBUG-u 0 1d13DC:0000 B91000 MOV CX,0010 13DC:0003 2BD2 SUB DX,DX 13DC:0005 8BD8 MOV BX,AX 13DC:0007 81E31011 AND BX,1110 13DC:000B D1EB SHR BX,1 13DC:000D D1EB SHR BX,1 13DC:000F 2BC3 SUB AX,BX 13DC:0011 D1EB SHR BX,1 13DC:0013 2BC3 SUB AX,BX 13DC:0015 D1E8 SHR AX,1 13DC:0017 D1DA RCR DX,1 13DC:0019 E2EA LOOP 0005 13DC:001B CD01 INT 01 13DC:001D 90 NOP

-raxAX 0000

152

:9999-g

AX=0000 BX=0000 CX=0000 DX=270F SP=0000 BP=0000 SI=0000 DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=001D NV UP EI PL ZR NA PE NC 13DC:001D 90 NOPThe program can be improved as shown; also this gives a partial hint as to the operations done:; This program converts 4 digit BCD to binary. Input in reg AX; output is also returned in AX; uses regs BX, CX and DX; the program continuously divides the BCD data by 2; to get the 10 lsb's. The 4 msb's are then got simply; by rotating and ORing as is, and further rotated right by 2 more bits ; (these bits are just 0’s) to properly align the hex result.code segment assume cs: code strt: mov cx, 10 sub dx, dx next: mov bx, ax and bx, 1110h shr bx, 1 shr bx, 1 sub ax, bx shr bx, 1 sub ax, bx shr ax, 1 rcr dx, 1 loop next mov cl, 6 ror ax, cl ror dx, cl or ax, dx int 1

2. The program below reverses an array in-situ, using the array start and array end addresses in regs SI and DI. Study the logic and the clever use of the string instructions in the program; also study the loop control adopted in the program without using reg cx. Check that the program works both for even number of elements in the array as well as odd number of elements. Check that the central element in an array with odd elements is left as it is and not handled at all by the program.

; This program changes an array in-situ; watch the clever use of string instructions here; watch also the array loop control without using reg. cxdata segmentarray db 1, 2, 3, 4, 5, 6arr_end db 7Data ends;code segmentassume cs:code, ds: data, es: data

153

start: mov ax, data mov ds, ax mov es, ax mov si, offset array mov di, offset arr_end std back: mov al, [di] xchg al, [si] stosb inc si cmp si, di jb back int 1 code ends end start 3. Write an appropriate 8086 assembly language program to test the array reversing macro under section 6 of this Chapter. Test the working of the program.

4. Study the various int 21H functions from the internet, and write small programs to use some of them. The site at: bbc.nvg.org/doc/Master%20512%20Technical%20Guide/m512techb_int21.htm for example, gives good information.


154

6. ILLUSTRATING THE POWER OF THE 8086 PROCESSOR

Introduction to Handling Complex Programs: As discussed in Chapter 3, it is necessary to have a clear idea of the steps involved in the program and to take care to allocate proper registers for handling the different variables in the program. The algorithms are to be chosen to keep the powers of the registers and of the instruction set fully exploitable and usable. In this chapter, we shall see some reasonably complex programs of the number crunching type. We will start by multi hex word multiplication and then progress towards multi decimal word multiplication. The decimal facility provided in the 8086 is not much. So we will use a hybrid hex/decimal system to do the job. Add/sub operations on multi word numbers are relatively simple. Division is difficult as we shall see. We will also see a factorial computations for large decimal numbers and building up of a prime number table.

1. Multiplying a multi-word Hex number by a single word Hex number: In this problem, the basic operation is simple word to word multiplications. Doing the multi-word by multi-word type of operations involves two steps like handling matrices. In fact, in one round of multiplication, a single word of the multiplier could multiply the complete multiplicand word string which we seein this section. This operation could be repeated as many times as there are words in the multiplier string in the second round as we will see in the next section. The first round of multiplication can be easily encapsulated in a macro as indicated below. The assembly language version, the machine language version and the testing of the program in the debug are all shown below, for a multi-word by single word multiplication.

code segmentAssume cs: code, ds:code, es:code; mmul macro local again xor bp,bp again: lodsw mul bx xchg dx,bp add ax, dx adc bp, 0 stosw loop again mov [di], bp endm ; start: mov ax, cs mov ds, ax mov es, ax mov si, offset mpd mov di, offset prd mov bx, 0abcdh ; multiplier cld mmul ; cx will have to be loaded manually during execution int 1 ; align 2 mpd dw 1234h, 56feh, 67abh, 89cdh

155

prd dw 5 dup (0) ; code ends end startTesting in debug-u 0 2113DC:0000 8CC8 MOV AX,CS 13DC:0002 8ED8 MOV DS,AX 13DC:0004 8EC0 MOV ES,AX 13DC:0006 BE2400 MOV SI,0024 13DC:0009 BF2C00 MOV DI,002C 13DC:000C BBCDAB MOV BX,ABCD 13DC:000F FC CLD 13DC:0010 33ED XOR BP,BP 13DC:0012 AD LODSW 13DC:0013 F7E3 MUL BX 13DC:0015 87D5 XCHG DX,BP 13DC:0017 03C2 ADD AX,DX 13DC:0019 83D500 ADC BP,+00 13DC:001C AB STOSW 13DC:001D E2F3 LOOP 0012 13DC:001F 892D MOV [DI],BP 13DC:0021 CD01 INT 01 -rAX=0000 BX=0000 CX=0036 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=13CC ES=13CC SS=13DC CS=13DC IP=0000 NV UP EI PL NZ NA PO NC 13DC:0000 8CC8 MOV AX,CS -d cs:24 3513DC:0020 34 12 FE 56-AB 67 CD 89 00 00 00 00 4..V.g......13DC:0030 00 00 00 00 00 00 ......-rcxCX 0036:4 ; number of words in the multiplicand loaded manually. -rAX=0000 BX=ABCD CX=0004 DX=0000 SP=0000 BP=0000 SI=0024 DI=002C DS=13CC ES=13CC SS=13DC CS=13DC IP=0000 NV UP EI PL NZ NA PO NC 13DC:0000 8CC8 MOV AX,CS -gAX=8DBB BX=ABCD CX=0000 DX=4592 SP=0000 BP=5C7A SI=002C DI=0034 DS=13DC ES=13DC SS=13DC CS=13DC IP=0023 NV UP EI PL NZ NA PO NC 13DC:0023 90 NOP -d 24 3513DC:0020 34 12 FE 56-AB 67 CD 89 A4 4F 9D 5F 4..V.g...O._13DC:0030 50 77 BB 8D 7A 5C Pw..z\ It can easily be checked that multiplication of hex no:‘89cd 67ab 56fe 1234’ by hex no:‘abcd’ is equal to hex no:’5c7a 8dbb 7750 5f9d 4fa4’ using the scientific calculator of the system in the hex mode in 2 rounds.

2. Multiplication of m-word hex number by n-word hex number: To

do this operation, we need more number of registers than the 8086 can provide. So we may use the stack frame as an extension of the register set. We may pass the parameters through the registers, but in the subroutine we may put them in a stack frame so that they

156

could be used whenever there is a need without being lost. The parameters required will be m, n, and the start addresses of the multiplicand array of m-words and of the multiplier array of n-words. The operations involved will be obtaining the multi-word operand with word by word multiplication of the multiplier, and adding these products with proper alignment. An additional temporary word array of (m+1)-words would be required to store the partial results of single multiplier word multiplication with the complete multiplicand. As we have seen in the example above, encapsulation of the operation (word* multi-word) multiplication may not be needed (the macro for this was used only once in our example 1 earlier), and we shall write the complete operation as a subroutine, with the parameters passed through the registers of the processor.

data segment mpd dw 9fedh, 8abch, 7efah, 0fdabh ; multiplicand dw 252 dup (0) ; multiplicand can go upto a total of 256 words. mpr dw 0f123h, 9cdeh, 8754h, 1156h, 3478h, 73fbh ; multiplier dw 250 dup (0) ; multiplier can also be upto 256 words prod dw 512 dup (0) ; product array has a space of 512 words temp dw 257 dup (?) ; temporary use, (word*256 word) = 257 words dw 15 dup (0) ; extra space m dw 4 n dw 6 data ends;code segment assume cs:code, ds:data, es:datastrt: mov ax, data mov ds, ax mov es, ax mov ax, n mov cx, m mov bx, offset mpr ; addresses mov si, offset mpd mov dx, offset temp mov di, offset prod call mmmult int 1 mmmult proc near ;initialising ;prepare the stack frame push dx ; address of temp [bp + 12] push bx ; address of mpr [bp + 10] push si ; address of mpd [bp + 8] push di ; address of prod [bp + 6] push ax ; value of n [bp + 4] push cx ; value of m [bp + 2] push bp mov bp, sp sub ax, ax push ax push ax ; 2 word locations, for local variables in the stack ; frame: [bp - 2] partial prod, [bp - 4] mpr word ; position (outer loop index). cld ; now proceed to clear the temp space; outer loop starts here olup: sub ax,ax mov cx, 257 mov di, offset temp rep stosw

157

mov bx, [bp + 10] add bx, [bp - 4] mov bx, [bx] mov di, [bp + 12] mov si, [bp + 8] mov cx,[bp + 2]; inner loop starts now.ilup: lodsw mul bx xchg dx, [bp – 2] add ax, dx adc word ptr[bp - 2], 0 stosw loop ilup mov ax, [bp - 2] stosw ; inner loop over mov cx, [bp + 2] inc cx mov si, [bp + 12] mov di, [bp + 6] add di, [bp - 4] clc; another loop nested inside the outer loopcomp: lodsw adc ax, [di] stosw loop comp; nested loop completed mov [bp - 2], cx ;clear [bp - 2], note cx = 0 here. add word ptr [bp - 4], 2 mov cx, word ptr[bp + 4] add cx, cx cmp cx, word ptr [bp - 4] jnz olup; outer loop over - prepare to return mov sp, bp ; unwind the stack frame and clear the stack pop bp pop cx pop ax pop di pop si pop bx pop dx ret ; and return mmmult endp code ends end strt-u 0 89147F:0000 B8DC13 MOV AX,13DC 147F:0003 8ED8 MOV DS,AX 147F:0005 8EC0 MOV ES,AX 147F:0007 A1220A MOV AX,[0A22] 147F:000A 8B0E200A MOV CX,[0A20] 147F:000E BB0002 MOV BX,0200 147F:0011 BE0000 MOV SI,0000 147F:0014 BA0008 MOV DX,0800 147F:0017 BF0004 MOV DI,0400 147F:001A E80200 CALL 001F 147F:001D CD01 INT 01 147F:001F 52 PUSH DX

158

147F:0020 53 PUSH BX 147F:0021 56 PUSH SI 147F:0022 57 PUSH DI 147F:0023 50 PUSH AX 147F:0024 51 PUSH CX 147F:0025 55 PUSH BP 147F:0026 8BEC MOV BP,SP 147F:0028 2BC0 SUB AX,AX 147F:002A 50 PUSH AX 147F:002B 50 PUSH AX 147F:002C FC CLD 147F:002D 2BC0 SUB AX,AX 147F:002F B90101 MOV CX,0101 147F:0032 BF0008 MOV DI,0800 147F:0035 F3 REPZ 147F:0036 AB STOSW 147F:0037 8B5E0A MOV BX,[BP+0A] 147F:003A 035EFC ADD BX,[BP-04] 147F:003D 8B1F MOV BX,[BX] 147F:003F 8B7E0C MOV DI,[BP+0C] 147F:0042 8B7608 MOV SI,[BP+08] 147F:0045 8B4E02 MOV CX,[BP+02] 147F:0048 AD LODSW 147F:0049 F7E3 MUL BX 147F:004B 8756FE XCHG DX,[BP-02] 147F:004E 03C2 ADD AX,DX 147F:0050 8356FE00 ADC WORD PTR [BP-02],+00 147F:0054 AB STOSW 147F:0055 E2F1 LOOP 0048 147F:0057 8B46FE MOV AX,[BP-02] 147F:005A AB STOSW 147F:005B 8B4E02 MOV CX,[BP+02] 147F:005E 41 INC CX 147F:005F 8B760C MOV SI,[BP+0C] 147F:0062 8B7E06 MOV DI,[BP+06] 147F:0065 037EFC ADD DI,[BP-04] 147F:0068 F8 CLC 147F:0069 AD LODSW 147F:006A 1305 ADC AX,[DI] 147F:006C AB STOSW 147F:006D E2FA LOOP 0069 147F:006F 894EFE MOV [BP-02],CX 147F:0072 8346FC02 ADD WORD PTR [BP-04],+02 147F:0076 8B4E04 MOV CX,[BP+04] 147F:0079 03C9 ADD CX,CX 147F:007B 3B4EFC CMP CX,[BP-04] 147F:007E 75AD JNZ 002D 147F:0080 8BE5 MOV SP,BP 147F:0082 5D POP BP 147F:0083 59 POP CX 147F:0084 58 POP AX 147F:0085 5F POP DI 147F:0086 5E POP SI 147F:0087 5B POP BX 147F:0088 5A POP DX 147F:0089 C3 RET -g 1aAX=0006 BX=0200 CX=0004 DX=0800 SP=0000 BP=0000 SI=0000 DI=0400 DS=13DC ES=13DC SS=13DC CS=147F IP=001A NV UP EI PL NZ NA PO NC 147F:001A E80200 CALL 001F -d 0 f ; multiplicand - size 4 words - as below

159

13DC:0000 ED 9F BC 8A FA 7E AB FD-00 00 00 00 00 00 00 00 .....~..........-d 200 20f ; multiplier - size 6 words 13DC:200 23 F1 DE 9C 54 87 56 11-78 34 FB 73 00 00 00 00 #...T.V.x4.s.... -d a20 a2f ; input – data sizes in words of mpd and mpr 13DC:0A20 04 00 06 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-gAX=0006 BX=0200 CX=0004 DX=0800 SP=0000 BP=0000 SI=0000 DI=0400 DS=13DC ES=13DC SS=13DC CS=147F IP=001F NV UP EI PL ZR NA PE NC 147F:001F 52 PUSH DX -d 400 41f ; prodduct output 10 (or m+n) words in size 13DC:0400 67 FA DD A5 A7 EE A3 5F-7D 71 54 30 8C 4D 55 DB g......_}qT0.MU.13DC:0410 2D F5 EC 72 00 00 00 00-00 00 00 00 00 00 00 00 -..r............

The main and the sub-program above use 8AH (or 138) decimal bytes of memory, and the sub routine uses only 107 bytes of memory with less than 60 instructions.

The results indicated above say that FDAB7EFA8ABC9FED multiplied by 73FB3478115687549CDEF123 is 72ECF52DDB554D8C3054717D5FA3EEA7A5DDFA67. Using the scientific calculator of the system, this result can be verified, not so easily as in the earlier case, of course!

The following allocation of the data segment may be noted:0000-01FF hex: space for the multiplicand words0200-03FF hex: space for the multiplier words0400-07FF hex: space for the product words0800-0A00 hex: space for the temporary product of single word multiplication of

all the multiplicand words0A02-0A1Fhex: not used0A20 hex : multiplicand word count0A22 hex : multiplier word count

It can therefore be observed that this program, as it is, will be useful for multiplication of upto 256-word by 256-word hex numbers, that is, binary 4096-bit by 4096-bit numbers. Operations of this magnitude will be needed in cryptography and other applications. An example of 255 x 255 word (or 4080 x 4080 bit) multiplication is shown below. (However, nothing prevents us from using the entire data segment, in which case, we can easily go up to 32000 digit hex numbers for our multiplier and multiplicand.)

data segment mpd dw 255 dup (0ffffh) ; multiplicand dw 0 mpr dw 255 dup (0ffffh) ; multiplier dw 0 prod dw 512 dup (0) ;.... .... . ; product array temp dw 257 dup (?) ; temporary use dw 15 dup (0) m dw 255 n dw 255 data ends

160

;code segment ; this and the un-assembled program are the same as shown earlier.TESTING IN DEBUG -g ;execute the program AX=00FF BX=0200 CX=00FF DX=0800 SP=0000 BP=0000 SI=0000 DI=0400 DS=13DC ES=13DC SS=13DC CS=147F IP=001F NV UP EI PL ZR NA PE NC 147F:001F 52 PUSH DX -d0 9ff ; The displayed data are separated and labled for the sake of clarity ; multiplicand below- 255 words13DC:0000 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0010 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0020 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0030 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0040 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0050 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0080 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0090 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0100 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0110 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0120 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0130 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0140 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0150 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0160 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0170 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0180 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0190 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:01A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:01B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:01C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:01D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:01E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:01F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF 00 00 ................;The green highlighted words here and below are not in the data or results ; multiplier below - 255 words:13DC:0200 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0210 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0220 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0230 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0240 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0250 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0260 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0270 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0280 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0290 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:02A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:02B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................

161

13DC:02C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:02D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:02E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:02F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0300 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0310 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0320 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0330 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0340 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0350 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0360 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0370 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0380 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0390 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:03A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:03B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:03C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:03D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:03E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:03F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF 00 00 ................ ;Product below - 510 words13DC:0400 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0440 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0450 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0460 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0470 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0500 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0510 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0520 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0530 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0540 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0550 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0560 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0570 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0580 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0590 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:05A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:05B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:05C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:05D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:05E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:05F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 FE FF ................13DC:0600 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0610 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0620 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0630 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0640 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0650 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0660 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0670 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................

162

13DC:0680 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0690 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:06A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:06B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:06C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:06D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:06E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:06F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0700 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0710 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0720 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0730 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0740 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0750 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0760 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0770 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0780 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0790 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:07A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:07B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:07C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:07D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:07E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:07F0 FF FF FF FF FF FF FF FF-FF FF FF FF 00 00 00 00 ................ ; Data in Temp. location - 256 words - last word-multiply result13DC:0800 01 00 FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0810 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0820 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0830 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0840 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0850 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0860 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0870 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0880 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0890 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:08A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:08B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:08C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:08D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:08E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:08F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0900 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0910 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0920 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0930 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0940 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0950 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0960 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0970 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0980 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0990 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:09A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:09B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:09C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:09D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:09E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:09F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FE FF ................-q

3. Handling Large BCD Numbers: Intel 8086 provides a minimal facility for handling decimal numbers, but when we are thinking of large decimal numbers in the

163

BCD representation, these facilities are very inadequate. However, it is possible to handle BCD, directly, while keeping the computations in hex at word level and using the decimal power for inter-word operations. I call this as the method of power BCD. The meaning will become clear, if we consider an example. Suppose we want to multiply the BCD numbers 1234 and 5678, what we may do is to multiply the hex equivalent of 1234 by the hex equivalent of 5678, and divide the result by 10000 (104 for 4 digit BCD computations) decimal to get two hex words. If we convert these two hex words to their BCD equivalents we get the complete product in BCD. Our first step will be to convert 1234 and 5678 into their hex equivalents. These turn out to be 4D2 and 162E. These two hex numbers can be directly multiplied to obtain the hex product 6AE9BC. Dividing this by 10000 (equivalent to 2710 hex) we obtain the quotient 2BC and the remainder 19FC. We can now convert these two hex numbers to BCD to get 0700 6652 the complete product in the BCD form. This can be called BCD4 method as the 4-digit BCD is handled in terms of 4-digit hex at a time. Two conversion processes become necessary here. First, we convert the BCD words to Hex; and second, we convert the Hex words less than 10000 (decimal) to BCD. These conversions we have done already and in this context, they can be frozen into macros in an optimized fashion to be invoked whenever needed. The two programs are shown below:

Condh macro ;; macro to convert BCD decimal word to hex word mov bx, ax ;; the data for conversion is assumed in ax and ax, 0f0f0h ror ax, 1 ror ax, 1 sub bx, ax ror ax, 1 ;; the given number converted to concatenation

of sub bx, ax ;; two hex bytes; BCD 9863, for example,

becomes ;; hybrid (62)(3F), 62H = 98 BCD, and 3FH = 63 BCD

mov al, 100 mul bh sub bh, bh add ax, bx ;; result of conversion in ax. Endm Conhd macro ;; macro to convert hex word < 2710H to BCD word.

;; input BCD, assumed in ax mov bl, 100 div bl mov bl, ah aam xchg ax, bx aam xchg ah, bl shl bx, 1 shl bx, 1 shl bx, 1 shl bx, 1 add ax, bx ;; output BCD in ax. Macro uses only bx with ax.

Endm

Both the above macros are reasonably optimal and use only BX register as additional facility required for the conversion. The input and the converted output are both in register AX. Now we should look at the multiplication of two such words to

164

produce a result which contains hex words such that the conversion of each hex word to BCD and concatenation produces the decimal value string of the product. This corresponds to the operation in the inner loop of the previous program. That program can easily be modified with an additional temporary storage in the stack frame of the number 2710H at the location, say, [BP – 6]. This program will now become:ilup: lodsw mul bx add ax, word ptr[bp – 2]; [bp-2] contains the number of 10000’s carried ; from the previous multiplication adc dx, 0 div word ptr [bp - 6] ; [bp – 6] has 10000 decimal (2710h) xchg ax, dx mov word ptr [bp – 2],dx ;10000’s saved for the next word multiplication stosw loop ilup mov ax, word ptr[bp – 2] stosw

Simple addition of two words , in this Power BCD4 system, can be seen in the following program:

BCD4_add: add ax, bx add ax, 0d8f0h; 0d8f0h is the negative of 2710h (10000 decimal)

jc down sub ax, 0d8f0h

down: It is easy to see the above program leaves the data in ax corrected for BCD4

addition along with proper carry in the flag register for BCD4 add operation.

The highlighted portion in the above programs can be seen as the extra for decimal operation in this loop and in the addition program. Everything else will remain the same as the hex program, excepting the original BCD data conversion at the beginning and the final conversion of the result in BCD4 form to BCD form at the end. If other operations also are required on the BCD numbers then the power BCD representation can be conveniently used right through without much difficulty.

The decimal m word by n word multiplication is done in the program below, almost on the same lines as the hex multiplication already considered. In this program, first the multiplicand BCD input is converted to BCD4 and stored elsewhere in the memory and used for multiplication. The multiplier conversion is done word by word at a time; the multiplier word when it is taken for multiplication is converted on the fly. The program keeps both the multiplier and multiplicand inputs, undisturbed in the memory. The multiplicand and multiplier considered in this demo program are both 16386 decimal digits long and the product is double this size. It should be noted that the data stored in memory is hexadecimal (or binary). The word stored in the memory for (BCD) 9998 is 1001 1001 1001 1000 in binary, and it is our interpretation that its value is not hex 9998 but decimal 9998. when converted to BCD4 form this word becomes 270E. This word is, of course, hex.

data segment mpdraw dw 9998h ; raw or BCD array for the multiplicand starts here. dw 4095 dup (9999h) ; continuation of mpd mpr dw 9997h, 4095 dup (9999h)

165

prod dw 8192 dup (0) ; cleared product array mpd dw 4096 dup (?) ; mpd array in BCD4 form used for the computations. temp dw 4097 dup (?) ; reserved for temporary use dw 15 dup (0) m dw 4096 ; no. of mpd words n dw 4096 ; no. of mpr words data ends;code segment assume cs:code, ds:data, es:data condh macro mov bx, ax and ax, 0f0f0h ;; for digit separation ror ax, 1 ror ax, 1 sub bx, ax ror ax, 1 sub bx, ax mov al, 100 mul bh sub bh, bh endm ;; note the last add ax, bx is omitted in this ; conhd macro mov bl, 100 div bl mov bl, ah aam xchg ax, bx aam xchg ah, bl shl bx, 1 shl bx, 1 shl bx, 1 shl bx, 1 add ax, bx endm ;strt: mov ax, data mov ds, ax mov es, ax mov ax, n ; count of words of mpr mov cx, m ; count of words of mpd mov bx, offset mpr ; addresses mov si, offset prod mov dx, offset temp mov di, offset mpd mov bp, offset mpdraw call dmmult int 1 dmmult proc near ;initialising ;prepare the stack frame push dx ; address of temp @ [bp + 12] push bx ; address of mpr @ [bp + 10] push di ; address of mpd @ [bp + 8] push si ; address of prod @ [bp + 6] push ax ; value of n @ [bp + 4] push cx ; value of m @ [bp + 2] push bp ; address of mpdraw @ [bp] mov bp, sp sub ax, ax push ax

166

push ax ; 2 word locations, for local variables in the stack ; frame: [bp - 2] partial prod, [bp - 4] mpr word ; position (outerloop index). mov ax, 2710h push ax ; [bp - 6] stores decimal 10000 cld; ; conversion of raw mpd data to the BCD4 form mov si, [bp] ; address of mpdraw conlup1: lodsw ; note di and cx are properly loaded at entry to the proc. condh add ax, bx ; BCD4 in ax now stosw loop conlup1 ; conversion complete for mpd ; now clear the temp spaceolup: sub ax,ax ; outer loop mov cx, 257 mov di, [bp + 12] rep stosw; The next multiplier word is taken up mov bx, [bp + 10] add bx, [bp - 4] mov ax, [bx] ; this is now to be converted to BCD4 form condh add bx, ax ; BCD4 in bx now mov di, [bp + 12] mov si, [bp + 8] mov cx,[bp + 2];ilup: lodsw ; inner loop mul bx add ax, [bp - 2] adc dx, 0 div word ptr [bp - 6] ; div by 10000 xchg ax, dx mov [bp - 2], dx stosw loop ilup mov ax, [bp - 2] stosw ; inner loop over mov cx, [bp + 2] inc cx mov si, [bp + 12] mov di, [bp + 6] add di, [bp - 4] clc; another loop nested inside the outer loop; note the modification for powerBCD addition as belowcomp: lodsw adc ax, [di] add ax, 0d8f0h; 2's complement of 2710h jc ddown sub ax, 0d8f0h ;there will not be any carry from this!ddown:stosw loop comp; inner nested loop completed mov [bp - 2], cx ;clear [bp - 2] add word ptr [bp - 4], 2 mov cx, word ptr[bp + 4] add cx, cx cmp cx, word ptr [bp - 4]

167

jnz olup; outer loop over - prepare to convert result to BCD; mov di, [bp+6] ; prod address mov cx, [bp + 2] add cx, [bp + 4]conlup2: mov ax, [di] conhd stosw loop conlup2 ; conversion over here, now prepare to return mov sp, bp pop bp pop cx pop ax pop si pop di pop bx pop dx ret ; and return mmmult endp code ends end strt;TESTING IN THE DEBUG-u 0 f6149F:0000 B8DC13 MOV AX,13DC 149F:0003 8ED8 MOV DS,AX 149F:0005 8EC0 MOV ES,AX 149F:0007 A122CO MOV AX,[C022] 149F:000A 8B0E20C0 MOV CX,[C020] 149F:000E BB0020 MOV BX,2000 149F:0011 BE0040 MOV SI,4000 149F:0014 BA00A0 MOV DX,A000 149F:0017 BF0080 MOV DI,8000 149F:001A BD0000 MOV BP,0000 149F:001D E80200 CALL 0022 149F:0020 CD01 INT 01 149F:0022 52 PUSH DX 149F:0023 53 PUSH BX 149F:0024 57 PUSH DI 149F:0025 56 PUSH SI 149F:0026 50 PUSH AX 149F:0027 51 PUSH CX 149F:0028 55 PUSH BP 149F:0029 8BEC MOV BP,SP 149F:002B 2BC0 SUB AX,AX 149F:002D 50 PUSH AX 149F:002E 50 PUSH AX 149F:002F B81027 MOV AX,2710 149F:0032 50 PUSH AX 149F:0033 FC CLD 149F:0034 8B7600 MOV SI,[BP+00] 149F:0037 AD LODSW 149F:0038 8BD8 MOV BX,AX 149F:003A 25F0F0 AND AX,F0F0 149F:003D D1C8 ROR AX,1 149F:003F D1C8 ROR AX,1 149F:0041 2BD8 SUB BX,AX 149F:0043 D1C8 ROR AX,1 149F:0045 2BD8 SUB BX,AX 149F:0047 B064 MOV AL,64 149F:0049 F6E7 MUL BH 149F:004B 2AFF SUB BH,BH

168

149F:004D 03C3 ADD AX,BX 149F:004F AB STOSW 149F:0050 E2E5 LOOP 0037 149F:0052 2BC0 SUB AX,AX 149F:0054 B90101 MOV CX,0101 149F:0057 8B7E0C MOV DI,[BP+0C] 149F:005A F3 REPZ 149F:005B AB STOSW 149F:005C 8B5E0A MOV BX,[BP+0A] 149F:005F 035EFC ADD BX,[BP-04] 149F:0062 8B07 MOV AX,[BX] 149F:0064 8BD8 MOV BX,AX 149F:0066 25F0F0 AND AX,F0F0 149F:0069 D1C8 ROR AX,1 149F:006B D1C8 ROR AX,1 149F:006D 2BD8 SUB BX,AX 149F:006F D1C8 ROR AX,1 149F:0071 2BD8 SUB BX,AX 149F:0073 B064 MOV AL,64 149F:0075 F6E7 MUL BH 149F:0077 2AFF SUB BH,BH 149F:0079 03D8 ADD BX,AX 149F:007B 8B7E0C MOV DI,[BP+0C] 149F:007E 8B7608 MOV SI,[BP+08] 149F:0081 8B4E02 MOV CX,[BP+02] 149F:0084 AD LODSW 149F:0085 F7E3 MUL BX 149F:0087 0346FE ADD AX,[BP-02] 149F:008A 83D200 ADC DX,+00 149F:008D F776FA DIV WORD PTR [BP-06] 149F:0090 92 XCHG DX,AX 149F:0091 8956FE MOV [BP-02],DX 149F:0094 AB STOSW 149F:0095 E2ED LOOP 0084 149F:0097 8B46FE MOV AX,[BP-02] 149F:009A AB STOSW 149F:009B 8B4E02 MOV CX,[BP+02] 149F:009E 41 INC CX 149F:009F 8B760C MOV SI,[BP+0C] 149F:00A2 8B7E06 MOV DI,[BP+06] 149F:00A5 037EFC ADD DI,[BP-04] 149F:00A8 F8 CLC 149F:00A9 AD LODSW 149F:00AA 1305 ADC AX,[DI] 149F:00AC 05F0D8 ADD AX,D8F0 149F:00AF 7203 JB 00B4 149F:00B1 2DF0D8 SUB AX,D8F0 149F:00B4 AB STOSW 149F:00B5 E2F2 LOOP 00A9 149F:00B7 894EFE MOV [BP-02],CX 149F:00BA 8346FC02 ADD WORD PTR [BP-04],+02 149F:00BE 8B4E04 MOV CX,[BP+04] 149F:00C1 03C9 ADD CX,CX 149F:00C3 3B4EFC CMP CX,[BP-04] 149F:00C6 758A JNZ 0052 149F:00C8 8B7E06 MOV DI,[BP+06] 149F:00CB 8B4E02 MOV CX,[BP+02] 149F:00CE 034E04 ADD CX,[BP+04] 149F:00D1 8B05 MOV AX,[DI] 149F:00D3 B364 MOV BL,64 149F:00D5 F6F3 DIV BL 149F:00D7 8ADC MOV BL,AH 149F:00D9 D40A AAM

169

149F:00DB 93 XCHG BX,AX 149F:00DC D40A AAM 149F:00DE 86E3 XCHG AH,BL 149F:00E0 D1E3 SHL BX,1 149F:00E2 D1E3 SHL BX,1 149F:00E4 D1E3 SHL BX,1 149F:00E6 D1E3 SHL BX,1 149F:00E8 03C3 ADD AX,BX 149F:00EA AB STOSW 149F:00EB E2E4 LOOP 00D1 149F:00ED 8BE5 MOV SP,BP 149F:00EF 5D POP BP 149F:00F0 59 POP CX 149F:00F1 58 POP AX 149F:00F2 5D POP BP 149F:00F3 5F POP DI 149F:00F4 5B POP BX 149F:00F5 5A POP DX 149F:00F6 C3 RET

-g 1dAX=1000 BX=2000 CX=1000 DX=A000 SP=0000 BP=0000 SI=4000 DI=8000 DS=13DC ES=13DC SS=13DC CS=1FDF IP=001D NV UP EI PL NZ NA PO NC 1FDF:001D E80200 CALL 0022; this is before entry to subroutinePartial view of the data segment at entry to the subroutine -d 0 f13DC:0000 98 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................-d 2000 200f13DC:2000 97 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................-d 4000 400f13DC:4000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-d 6000 600f13DC:6000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-d 8000 800f13DC:8000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-d a000 a00f13DC:A000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-d c000 c02f13DC:C000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:C010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:C020 00 10 00 10 00 00 00 00-00 00 00 00 00 00 00 00 ................ m n-g ;go and complete the subroutine and stop at ‘int 1’.AX=1000 BX=2000 CX=1000 DX=A000 SP=0000 BP=0000 SI=4000 DI=8000 DS=13DC ES=13DC SS=13DC CS=1FDF IP=0022 NV UP EI NG NZ NA PE NC 1FDF:0022 52 PUSH DX ; this is after exit from subroutine -d 0 c02f ; full display from this command, but only some portion is shown ; below ; view of the relevant portion of the data segment; at first, the multiplicand in BCD, (1000h words = 64536 decimal digits)13DC:0000 98 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................13DC:0010 ; the memory from here to the location shown filled entirely with 99 13DC:1FF0 99 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................; multiplicand up to this.

170

; now the multiplier in BCD, also the same size as multiplicand13DC:2000 97 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................13DC:2010 ; multiplier is also filled with data 99 upto the line shown below.13DC:3FF0 99 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................ ;excepting the highlighted words in mpd and mpr, rest of the words are all ;decimal 9999; now the result13DC:4000 06 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:4010 ; the part of the result in these memory area are all 0’s13DC:5FF0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:6000 95 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................13DC:6010 ; the intervening memory contains all 99’s.13DC:7FF0 99 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................;the highlighted words in the above result indicate the correctness of the calculation; the rest of the words are 0000 in the first half and 9999 in the last half as they should be.; below is the BCD4 conversion of the multiplicand13DC:8000 0E 27 0F 27 0F 27 0F 27-0F 27 0F 27 0F 27 0F 27 .'.'.'.'.'.'.'.' ; the data stored in the intervening memory are all 270f’s.13DC:9FF0 0F 27 0F 27 0F 27 0F 27-0F 27 0F 27 0F 27 0F 27 .'.'.'.'.'.'.'.'; below is the result of the last multiplier word multiplication of the multiplicand (4097 word result). 13DC:A000 02 00 0E 27 0F 27 0F 27-0F 27 0F 27 0F 27 0F 27 ...'.'.'.'.'.'.'13DC:A010 : the data stored in the intervening memory are all 270f’s.13DC:BFF0 0F 27 0F 27 0F 27 0F 27-0F 27 0F 27 0F 27 0F 27 .'.'.'.'.'.'.'.'13DC:C000 0E 27 00 00 00 00 00 00-00 00 00 00 00 00 00 00 .'..............;13DC:C010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:C020 00 10 00 10 00 00 00 00-00 00 00 00 00 00 00 00 ................;; The word at C000 address is the last word (4097th word)of the last partial product.; The 2 words at C020 and C022 represent m and n values

4. Factorial of large numbers in decimal displayed in the Big Endian Fashion: Utilising the ideas presented so far, the program below is devised for finding the factorial of numbers upto 7DA1 h (32177 decimal) and displaying them in the big endian fashion in decimal. The multiplication part here is simpler as at a time we have to multiply only one word with a BCD4 wordstring. If numbers larger than 7DA1 h are used, we end upwith results which do not fit into the 64KB of the data segment, if we use hexadecimal number system we can certainly handle factorials of larger numbers. The result is displayed in the data segment with termination indicated by FFFF h appearing after the result. This is done by initially filling all the result space with FFFF h, and overwriting the results as we proceed. The process can be improved, if at the end, the part of the unused data segment immediately following the result is filled up with FFFF, just once, maybe. The program is given below with a simple demonstration of its working. At the end, to get the big-endian display, the displayed array is simply inverted in situ, and that operation also we have studied already.

The .asm file for finding the factorial data segment dw 8000h dup(?) ; space reserved for result

171

data ends stack segment stack dw 256 dup(?) tos label word stack ends code segment assume cs:code, ds:data, es:data, ss:stack start: mov ax, data mov es,ax mov ds,ax mov ax,stack mov ss,ax mov sp,offset tos ;stack initiaization again1: int 1 ;now, load any number whose factorial is to be found ;into reg. ax (no. to be less than or = 7da1 hex. call fact jmp again1 ;get factorial of another number; load number in ax. fact proc near ;This procedure can find factorial of any number upto 7db1 hex or ;32177 decimal, It is basically in two steps ;Step 1 stores data input in ax, if necessary, ;in 2 memory words in the BCD4 format in the little endian fashion. ;This operation is also common to multiplication of step 2, and so it ;comes at the last part of step 2. ;Step 2 consists of doing multiplication successively by one less nuber ;obtained in the previous multiplication. ;Before concluding, step 3 marks the endof result with a flag word FFFF, ;converts the BCD4 to normal BCD format, so ;the fnal result is in BCD in the big endian fashion. ;the procedure assumes ds and es point to the same segment in memory ;step 1 mov bp,ax ; save ax in bp cld sub di, di ;initialize di to 0 mov cx, di ;clear cx, so that store of initial data value is OK mov bx,10000 ;for BCD4 handling or ax,ax ;ax = 0? jnz check ;if no,do further check (in step 2 later part) inc ax ; ax is 0, so put its factorial in ax. jmp store ;go to store the result (in step 2) ;step 2 ;di has 0, the start address of the multiplicand; bp has the ;multiplier in hex; both multiplier and multiplicand are thus in BCD4 ;form. so, multiplication in BCD4 form is carried out. cx has the number ;of words in the result (initialised to 0). si has 0 initially. mult2: sub ax,ax repz scasw ;this avoids MUL on initial 0's in the mult. process inc cx ;this and the next instruction compensate for one extra- Sub di,2 ;operation done on these 2 regs. by the repz instruction. mult: mov ax, word ptr[di] mult1: mul bp add ax, si ;si has residue from previous mult adc dx, 0 ;higher word of product div bx ;BCD4 conversion mov si,ax mov ax, dx stosw

172

loop mult ;on exit from the loop cx = 0. mov ax, si ;check if higher part is there in the result or ax, ax jz process ;ifso,go to process it. ;else check ax as shown next. ;store from now on, is the same as the initial input store ; so this forms the part 2 of step 1. check: cmp ax, bx ;greater than 10,000? jb down ;if no go down sub ax, bx ;the check loop is essentially,a divide by 10,000. inc cx ;note, cx is initialised to 0 in the mult loop (or at ;start before storing the given data). jmp check down: stosw jcxz process ;if cx = 0, data store over, go to further process mov ax, cx ;else do the store of the next BCD4 digit store: stoswprocess:cmp bp,2 ;check for the termination of the process jbe step3 ;if below or equal to 2, process over ;go to convert BCD4 to BCD and do big endian store. dec bp ;else continue to dec and multiply mov cx, di ;get word count in cx ror cx, 1 ;dx has the memory byte address which is double ;the word count, so divide by 2, using left shift sub di, di ;get start address in di mov si, di ;make initial residue 0. jmp mult2 ;proceed to multiply ;step 3. This step is used to convert the result in BCD4 to regular BCD ;and store it in the big-endian fashion so as to make it easy to view. ;note di at this point has an address 2 more than the last word storedstep3: mov ax, -1 ;to flag the end of data mov[di], axconvert:mov bx, 100 ;arranging for conversion mov cx, 4 ;shift count for conversion sub si, si ;get start address in si again: lodsw ;get the BCD4 number or ax, ax jz skip ;if ax is 0, skip conversion call conv ;BCD4 to BCD conversion routine skip: sub di,2 ;get the last unconverted word address cmp si,di xchg ax, [di] ja finish call conv mov [si - 2], ax cmp si,di jnz again finish:add si,di ; get the final address of result where FFFF is stored ret fact endp conv proc near div bl xchg ah,bh aam rol al,cl ror ax,cl

173

xchg bh,al aam rol al,cl rol ax,cl xchg al,bh ret conv endp ;total memory used by the executable program = 172 bytes ;total no. of instructins used 85 code ends end start Testing in debug-gAX=23DC BX=0001 CX=02A7 DX=0000 SP=0200 BP=0000 SI=0000 DI=0000 DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI PL NZ NA PO NC 23FC:0011 E80200 CALL 0016 -raxAX 23DC:2f ; (2f hex = 47 decimal) -gAX=14F1 BX=0064 CX=0004 DX=0A1A SP=0200 BP=0002 SI=001E DI=000E DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI PL NZ NA PE NC 23FC:0011 E80200 CALL 0016 ; help for finding the termination of the result-d0 2f13DC:0000 25 86 23 24 15 11 16 81-80 64 29 64 35 51 53 61 %.#$.....d)d5QSa13DC:0010 19 79 96 91 97 63 23 89-12 00 00 00 00 00 FF FF .y...c#.........13DC:0020 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................ ; this result can be verified in the scientific calculator. ; the word FFFF at address 001E flags the end of the result. -raxAX 14F1:7db1 ; this is the maximum input possible- limited by the data segment size. -gAX=17FA BX=0064 CX=0004 DX=118C SP=0200 BP=0002 SI=FFFE DI=7FFE DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI NG NZ NA PO NC 23FC:0011 E80200 CALL 0016; location FFFE is where the result ends -d013DC:0000 44 92 60 64 35 41 31 94-14 42 51 01 76 89 57 78 D.`d5A1..BQ.v.Wx13DC:0010 61 91 34 55 16 28 36 58-31 77 63 99 74 35 01 00 a.4U.(6X1wc.t5..13DC:0020 32 73 17 02 13 31 81 84-14 30 27 95 19 99 90 37 2s...1...0'....713DC:0030 95 41 58 53 15 06 58 26-14 94 02 11 78 98 01 64 .AXS..X&....x..d13DC:0040 93 45 78 83 32 67 60 39-09 31 74 21 27 98 88 43 .Ex.2g`9.1t!'..C13DC:0050 85 94 64 18 99 56 02 57-67 69 88 66 04 39 88 25 ..d..V.Wgi.f.9.%13DC:0060 71 42 58 06 97 36 78 12-57 89 63 16 15 96 84 35 qBX..6x.W.c....513DC:0070 90 71 06 01 34 12 73 32-65 39 49 62 85 55 61 40 .q..4.s2e9Ib.Ua@ ;initial significant part of the result-d f00013DC:F000 63 00 16 22 26 47 69 72-27 27 92 14 45 15 37 86 c.."&Gir''..E.7.13DC:F010 03 91 25 24 99 87 19 12-04 36 67 91 04 70 28 29 ..%$.....6g..p()13DC:F020 85 43 18 33 10 92 79 52-67 42 70 59 12 54 93 68 .C.3..yRgBpY.T.h13DC:F030 99 52 81 80 36 06 08 26-12 38 28 75 16 27 89 41 .R..6..&.8(u.'.A13DC:F040 92 46 92 06 14 21 83 01-44 00 00 00 00 00 00 00 .F...!..D.......

174

13DC:F050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:F060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:F070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ ; from here till the end, the entries are all zeros.-d fff013DC:FFF0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 FF FF ................; The result covers the full data segment of 65536 (excepting the last two ;bytes). The result can be verified to a good extent using the Scientific ;calculator. Only portions of the reult in the data seg. are shown above.

5. Modular multiplication: In many cryptographic processes, there will be a need for doing modular multiplications of two large numbers and obtain the modular value of the product to a base of another large number. Modular value of a number to a base indicates the remainder obtained on dividing the number by the base number. The numbers we are speaking about here are of the order of 1024 bits or 2048 bits or may be even more. What we wish to compute is X*Y mod M with X,Y and M all of the order indicated above.

Algorithm: Interleaved Modular Multiplication. The algorithm shown below does the job reasonably well and is easy to understand. What is done is, the product is obtained by bitwise multiplication of X with bits of Y and at each stage of multiplication M is removed. Each stage may require 2 subtractions of M at most.

There are other algorithms available but here we give the interleaved modular multiplication.

INPUT: X; Y; M with 0 < = X,Y < = MX = ∑ xi*2i ; i = 0, 1,..., n-1; and similarly Y and M in terms of bits. OUTPUT: P = X * Y mod Mn: number of bits of each of X, Y and Myi: ith bit of Y1. P = 0;2. for (i = n – 1; i < = 0; i--)3. P = 2 * P;4. if (P > = M) P = P – M;5. I = yi * X ;6. P = P + I;7. if (P > = M) P = P – M; THE ASSEMBLY LANGUAGE PROGRAM MODMUL.ASMdata segment x dw 65535, 65535, 65535, 15, 76 dup(0); X = 000F FFFF FFFF FFFF hex dw 65535, 65535, 65535, 15, 75 dup(0); Y = 000F FFFF FFFF FFFF hex y dw 0 ; msw of Y m dw 1, 0, 0, 16, 76 dup (0) ; M = 0010 0000 0000 0001 hex p1 dw 80 dup(0) ; P of the algorithm p2 dw 80 dup(0) ; Scratch pad for temporary use xn dw 80 p1n dw 80 ; 80 words or 1279 bit (1 bit margin left to accommodate carry ; on addition) bit data can be had for each of X, Y and M ; ;data ends;code segment assume cs:code, ds:data, es:data ;

175

; since there are several variables involved, ; it is better to use the stack for variable store ; start: push ax push bx push di push dx push cx ; used regs saved in stack mov ax, data mov ds, ax mov es, ax ; segments initialized ; parameters stored in stack frame ; mov bx, offset x ; [bp+10] push bx mov bx, offset y ; [bp+8] push bx mov bx, offset m ; [bp+6] push bx mov bx, offset p1 ; [bp+4] push bx mov bx, offset p2 ; [bp+2] push bx push bp mov bp, sp sub sp, 2 ; temp at [bp-2] cld ; mov bx, [bp+8] lup1: mov ax, [bx] sub bx, 2 push bx ; bx points to next lower word of Y mov [bp-2], ax ; the current word of y mov dx, 16 lup2: mov di, [bp+4]; p1 is doubled mov si, di call addchk ; 2*p1 --> p1; (p1 – M) --> p2; if borrow, ignore p2 ; else interchange pointers p1 and p2 mov ax, [bp-2] shl ax, 1 mov [bp-2], ax jnc down1 mov di, [bp+4]; address p1 mov si, [bp+10] call addchk ; p1 + X --> p1; p1 – M --> p2; if borrow, ignore p2 ; else, interchange the pointers p1 and p2 down1: dec dx jnz lup2 pop bx dec [p1n] jnz lup1 mov si, [bp+4] mov sp, bp pop bp add sp, 10; clear the stack frame pop cx pop dx pop di pop bx pop ax ; retrieve the stacked parameters int 1 ; ;

176

addchk proc near ; ; this procedure adds [di] to [si] and subtracts m from the sum ; and puts the result in p2. If the result ends in a borrow, on ; subtraction, the result in p2 is ignored. ; Else, if no borrow, the result of subtraction goes to p1. This ; is achieved by simply interchanging p1 and p2 pointers, ; in case of no borrow on subtraction. mov cx, xn clc back10: lodsw adc ax, [di] stosw ; [si] + [di] --> [di] loop back10 ; mov si, [bp+4] ; address p1 mov di, [bp+2] ; address p2 mov bx, [bp+6] ; address m mov cx, xn clc back11: lodsw ; Computation p1-m --> p2 sbb ax, [bx] inc bx inc bx ; so that carry will not change stosw loop back11 jc down10 mov ax, [bp+2] xchg ax, [bp+4] mov [bp+2], ax ; pointers to p1 and p2 exchanged down10: ret addchk endp code ends end start

The program takes a space of 152 bytes apart from the data space of the data segment. It is arranged that the registers are all saved across the program excepting the register SI which points to the result of the modular multiplication. The data segments should have the indicated labels for the program to work. If these labels are stuck to, then the whole program can be used as a sub routine. Alternatively, the offset address of the start of the data may be passed to the subroutines through registers and directly these registers may be stacked. That will be helpful in the cryptographic situations.

The program works only when at least one of the two data to be multiplied is smaller than M, and the smaller one should be taken as X. The size of Y does not matter. In the case of cryptographic processes like RSA algorithms etc. both X and Y will be smaller than M. The reader can easily see why this is so by looking at the algorithm..

6. Division of large numbers: Division is a more complex problem than multiplication when large numbers are involved. In case of floating point number routines handled by software, division by any number a is done using approximate results for 1/a from a table, a vs 1/a. This result is improved to the desired accuracy by iteration in a few steps using Newton Raphson method. The theory of the method is given below.

Let x be an approximate inverse of a, such that ax = 1 - d, where d is the (signed error) in the value of x (per unit). That is, if x is the correct inverse of a, then x = x*(1 – d). We are trying to solve the equation a = 1/x or f(x) = a – (1/x) = 0.

177

Then x’ (the next approximation for x) = x – [f(x) / f ’(x)]; Newton-RaphsonOr x’ = x – [(a – 1/x)/ (1/x2)] = x – ax2 + x = x*(2 – ax) which becomes x’ = x*(2 – 1 + d), since ax = 1 – d. Thus, x’ = x*(1 + d). ax’ will now be ax*(1 + d) = 1 – d2, since ax = 1 – d

The highlighted equation in the last line above, indicates that the the error in the iterated value x’ is the square of the error in x. The calculation in each iteration is seen to involve only multiplication and addition, because x’ = x*( 2 – a*x), involving 2 multiplications and one subtraction. Further, it is to be noted that the accuracy goes up as the square of the error with each iteration. If our original value of the approximation has a 4 bit accuracy, next iteration will be of 8 bit accuracy and the next, of 16 bit accuracy and so on. Even starting from a 4 bit accuracy we can reach 64 bit accuracy for the result in 5 iterations. The table below is guaranteed to be accurate to 6 bits and is thus capable of giving better than 32 bit accuracy in 3 rounds of iteration (actually 48 bits) , good enough for single precision Floating Point calculations. One more round will give better than what is required for extended double precision format of IEEE standard 754.

The table below is prepared as follows: The inverse of 1.xxxyyy1 (where x and y are either bit 1 or bit 0) is taken and its value correct to 8 binary digits is computed and placed on the table against the position xxxyyy, where xxx corresponds to the row and yyy to the column. For example, the entry in the row 100 at column 110 corresponds to the inverse of 1.1001101. This inverse, correct to 8 bit accuracy, turns out to be: 0.1010 0000 or A0 H with a leading binary point. The maximum error in these occur exactly at the points indicated as those points are calculated for 1.xxxyyy1 and marked against 1.xxxyyy; at any other point, the inverse becomes closer to the indicated value. For example, the value marked at 1.100110 will be valid over the range 1.100 110 to very nearly 1.100 111, taking it as corresponding to 1.100 1101, will produce maximum error at the two extremities of the range and will be increasingly accurate at the middle of the interval. Worst error occurs at the value 1.000 000, and we see that value is correct upto 7 bits. With this we can get an accuracy of 56 bits which corresponds to double precision IEEE standard, with three iterations, we get good enough accuracy for any normal FP calculations. Once inverse is got division can be obtained using multiplication by the inverse.

000 001 010 011 100 101 110 111000 1FC 1F4 1EC 1E5 1DE 1D7 1D0 1CA.001 1C3 1BD 1B7 1B2 1AC 1A6 1A1 19C010 197 192 18D 188 183 17F 17A 176011 172 16E 16A 166 162 15E 15A 157100 153 150 14C 149 146 142 13F 13C101 139 136 133 130 12E 12B 128 125110 123 120 11E 11B 119 116 114 112111 10F 10D 10B 109 107 105 103 101

7 bit accuracy table of inverses

178

We shall not go into the details of this process here.

7. Division using the method followed for modular multiplication: But we will present a modification of the modular multiplication process, that we saw in the previous section to carry division of arbitrarily long numbers. The process is the same as the modular multiplication given, choosing X =1 , and Y as the dividend, the modular base will now be the divisor. The program is given below. The program handles up to 1280 bit dividend and up to1279 bit divisor, because of providing 80 words as the data size. The working memory space can be increased by providing more word size for use . We need 5 times the data size, and we can comfortably accommodate 3000 hex bytes (i.e., over 12000 decimal bytes or over 96000 bits) of data with this program. Once a provision has been made in the program, arbitrarily smaller data can be handled by making the leading words zero, and the program will handle this data. The efficiency of the program can be increased by using the actual word size using the data size determining macro of the program given earlier. We have used it to size the divisor data, but not for the dividend sizing. The size of the dividend can be accommodated by giving this size at the label ndd (standing for number of words of the dividend in the program. The program will need no alteration other than replacing the value of ndd by the size value got for the dividend.

; Division of arbitrary long numbers data segment dvd dw 40 dup(4096), 65530, 39 dup(65535) ;dividend dr dw 65534, 39 dup(65535), 40 dup(0) ;divisor qt dw 80 dup(?) ; both quotient and remainder spaces are r1 dw 80 dup(?) ; provided the same as dividend space r2 dw 80 dup(?) ndd dw 80 ; same space, as dividend at start spare dw 7 dup(?); data ends ; code segment assume cs:code, ds:data, es:data ; ; strt: mov ax, data mov ds, ax mov es, ax ; preparing stack frame mov ax, offset dvd ; [bp+14] push ax mov ax, offset dr ; [bp+12] push ax mov ax, offset qt ; [bp+10] push ax mov ax, offset r1 ; [bp+8] push ax mov ax, offset r2 ; [bp+6] push ax mov ax, ndd ; [bp+4] push ax call longdiv ; ret address [bp+2] ; int 1 ; ; now macros used

179

; clr macro offs, n ;; ax should be 0 at entry here mov di, offs mov cx, n sub ax, ax rep stosw endm ; double macro offsd, n local dub mov si, offsd mov di, si mov cx, n clc dub: lodsw adc ax, ax stosw loop dub endm ; ; to handle variable size divisors, it is necessary to find the ; exact word size of the divisor. note the dividend size is ; flexible and accommodated in the process. drsize macro ofset, nsize local size1 mov si, ofset mov cx, nsize add si, cx add si, cxsize1: sub si, 2 ; msw address of data at ofset mov ax, [si] or ax, ax loopz size1 rol ax, 1 adc cx,0 endm subt macro offs1, offs2, offs3, n ;; [offs1] - [offs2] --> offs3 local subt1 mov si, offs1 mov di, offs3 mov bx, offs2 mov cx, n clc subt1: lodsw sbb ax, [bx] inc bx inc bx ;; inc is used so that carry remains same. stosw loop subt1 endm ; longdiv proc near push bp mov bp, sp sub sp, 4 ; [bp-2] for loop count, [bp-4] for partially ; shifted data cld ; ; clear quotient space clr [bp+10], [bp+4] ; ; clear remainder space

180

clr [bp+8], [bp+4] ; ; now, the real works! First double quotient and rem. r1 drsize [bp+12], [bp+4] mov ax, [bp+4] mov [bp-2], ax ; current word count for loop mov bx, [bp+14] ; dividend start address dec ax shl ax, 1 add bx, ax ; dividend end address lup1: push bx mov ax, [bx] ; msw of dividend mov [bp-4], ax ; dividend partial word, (current) mov dx, 16 lup2: double [bp+10], [bp+4] double [bp+8], [bp+4] mov ax, [bp-4] shl ax, 1 mov [bp-4], ax mov bx, [bp+8] adc word ptr [bx], 0 ; inc r1, on carry subt [bp+8], [bp+12], [bp+6], 41 ; r1-divr = r2 jc down mov bx, [bp+10] inc word ptr[bx] ; quotient to be incremented ; the lsw of the quotient will end with a ; 0, hence incrementing the lsw is all ; that is required to increment the quotient ; now interchange r1 and r2 pointers mov ax, [bp+8] xchg ax, [bp+6] mov [bp+8], ax down: dec dx jnz lup2 pop bx sub bx, 2 dec word ptr[bp-2] jnz lup1 mov si, [bp+8] ; pointer to remainder mov sp, bp pop bp ret 12 longdiv endp code ends end strt; The program takes 211 bytes of memory in the code segment and about 810

bytes of memory in the data segmment with about 30 bytes in the stach segment. On testing the program with the above data, the following results are obtained as seen from the relevant Data Segment area.

; The result of testing the program in the debug after assembling and linking is presented below.

-gAX=FFFF BX=FFFE CX=0000 DX=0000 SP=0000 BP=0000 SI=01E0 DI=02D2 DS=13DC ES=13DC SS=13DC CS=140F IP=0024 NV UP EI PL ZR NA PE CY 140F:0024 55 PUSH BP; First instn of proc(after INT 1). -d 0 32f; The dividend13DC:0000 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................

181

13DC:0010 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0020 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0030 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0040 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0050 FA FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0080 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0090 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................; the divisor13DC:00A0 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0100 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0110 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0120 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; the quotient13DC:0140 FC FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0150 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0160 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0170 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0180 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; The remainder is here in this case, as pointed to by the SI reg.13DC:01E0 F8 0F 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:01F0 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0200 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0210 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0220 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0230 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0240 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0250 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0260 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0270 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; r2, the scratch pad area.13DC:0280 FA 0F 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:0290 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:02A0 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:02B0 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:02C0 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................13DC:02D0 FF FF 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:02E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:02F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0300 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0320 50 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 P...............

I have verified the result above, by using the principles of Vedic Mathematics. The yellow highlighted regions are helpful in the verification

182

; The same program tested with another set of data to indicate data size flexibility of the program.

data segment dvd dw 23567, 34239, 12345, 77 dup(0) ;dividend dr dw 0abcdh, 0123, 78 dup(0) ;divisor qt dw 80 dup(?) ; both quotient and remainder spaces are r1 dw 80 dup(?) ; provided the same space as dividend space r2 dw 80 dup(?) ndd dw 80 ; same space, as dividend at start spare dw 7 dup(?); data ends ; ; Results of test

-gAX=FFFF BX=FFFE CX=0000 DX=0000 SP=0000 BP=0000 SI=0280 DI=0232 DS=13DC ES=13DC SS=13DC CS=140F IP=0024 NV UP EI PL ZR NA PE CY 140F:0024 55 PUSH BP -d 0 32f; dividend 13DC:0000 0F 5C BF 85 39 30 00 00-00 00 00 00 00 00 00 00 .\..90..........13DC:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0040 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; divisor13DC:00A0 CD AB 7B 00 00 00 00 00-00 00 00 00 00 00 00 00 ..{.............13DC:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0100 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0110 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0120 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; quotient13DC:0140 50 D3 63 00 00 00 00 00-00 00 00 00 00 00 00 00 P.c.............13DC:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; scratchpad r2 changed to this place13DC:01E0 32 09 BD FF FF FF FF FF-FF FF FF FF FF FF FF FF 2...............13DC:01F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0200 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0210 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................

183

13DC:0220 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................13DC:0230 FF FF 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0240 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0250 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0260 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0270 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; the remainder r1 changed to this plce, see the register si for the; starting address of remainder.13DC:0280 FF B4 38 00 00 00 00 00-00 00 00 00 00 00 00 00 ..8.............13DC:0290 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:02A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:02B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:02C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:02D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:02E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:02F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0300 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................13DC:0320 50 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 P...............This result can be checked with the scientific calculator.

8. Some Macros that can be used for Large Number handling: Large numbers

can be considered as single data entities. For handling this way, we allot a data array to each large number. The array length can be of fixed size, like we handle word data in registers. The data size is nominally 16 bits, but it does not prevent us from considering any data of size upto 16 bits to be stored and handled in the registers. The simple number 1 can also be stored in a 16 bit register and it will be considered as 0001 hex. The number can be handled as 16 bit without encountering any computational problem. In a similar way, we can allot, say 256 bytes of memory to store any data upto 256 bytes or 2048 bits size. The data can be referenced by the start address of the array and by the number of data words used. The array will be filled from the start address and will extend as for as significant bits exist in the data and the remaining data words can be filled with 0’s. For example see the 3 byte or 2 word data stored in the remainder value presented in the array just above, (memory locations 13DC: 0280 – 031F). They can be considered as 3 byte data, or 2 word data or 160 byte without any ambiguity. The macros presented below can be viewed as handling data in this fashion. It will be a good method to store data so that they can be considered to arrays of equal size and the algorithms normally will be able to handle zero data in the leading words without serious problems, other than taking more time processing the zero data as valid data numbers. If one is particular one can extract the exact word size information, and limit the computations to the size available. But this is not necessary, as some of the programs above indicate. However, in the set of macros given below we have included a macro to find the actual number of significant words of a data array.

Here is a study of some macros useful in handling arbitrarily long integers.

MACROS USEFUL FOR HANDLING LARGE NUMBERSdata segment sc1 dw 02, 31 dup(0), 0fffeh, 30 dup (0ffffh), 0fffh, 64 dup (0) sc2 dw 32 dup (0ffffh), 96 dup (0) sc3 dw 128 dup (0) des dw 128 dup (?)

184

des2 dw 128 dup (?) n dw 128data ends; code segment assume cs: code, ds: data;; arbitrarily long integer handling macros; (i) end address calculation; lend macro src, reg, n mov reg, offset [src] mov cx, n dec cx add reg, cx add reg, cx inc cx endm;; (ii) [src1].as.[src2] --> [dest]; direction UP assumed; 'as' is 'adc' or 'sbb'; las macro src1, src2, dest, n, as local las1 mov si, offset[src1] mov bx, offset[src2] mov di, offset[dest] mov cx, n clc las1: lodsw as ax, [bx] stosw inc bx inc bx loop las1 endm;; (iii) mov arr1 to arr2; non overlapping arrays; movarr macro src, dest, n

mov si, offset[src]mov di, offset[dest]mov cx, n

rep movsw endm;; (iv) lsig obtain the significant number of words of a long integer; lsig macro src, n local lsig1 mov si, offset [src] mov cx, n dec cx add si, cx add si, cx add cx, 2 std lsig1: lodsw or ax, ax loopz lsig1 cld endm;

185

strt: mov ax, data mov ds, ax mov es, ax lend sc1, si, n las sc1, sc2, des, n, adc movarr des, des2, n las des2, sc1, des2, n, sbb lsig sc1, n int 1 code ends end strt

TESTING IN DEBUG-u 0 64; INITIALISATION0B96:0000 B8450B MOV AX,0B45 0B96:0003 8ED8 MOV DS,AX 0B96:0005 8EC0 MOV ES,AX ;; TESTING END ADDRESS COMPUTATION OF DATA [0] OR SC10B96:0007 BE0000 MOV SI,0000 0B96:000A 8B0E0005 MOV CX,[0500] 0B96:000E 49 DEC CX 0B96:000F 03F1 ADD SI,CX 0B96:0011 03F1 ADD SI,CX 0B96:0013 41 INC CX ;; TESTING LAS FOR ADD; SC1 + SC2 à [300] OR [0] + [100] à [300] 0B96:0014 BE0000 MOV SI,0000 0B96:0017 BB0001 MOV BX,0100 0B96:001A BF0003 MOV DI,0300 0B96:001D 8B0E0005 MOV CX,[0500] 0B96:0021 F8 CLC 0B96:0022 AD LODSW 0B96:0023 1307 ADC AX,[BX] 0B96:0025 AB STOSW 0B96:0026 43 INC BX 0B96:0027 43 INC BX 0B96:0028 E2F8 LOOP 0022 ;; TESTING MOVE; [300] à [400] 0B96:002A BE0003 MOV SI,0300 0B96:002D BF0004 MOV DI,0400 0B96:0030 8B0E0005 MOV CX,[0500] 0B96:0034 F3 REPZ 0B96:0035 A5 MOVSW ;; TESTING LAS FOR SUBTRACT; [400] – [0] à [400] 0B96:0036 BE0004 MOV SI,0400 0B96:0039 BB0000 MOV BX,0000 0B96:003C BF0004 MOV DI,0400 0B96:003F 8B0E0005 MOV CX,[0500] 0B96:0043 F8 CLC 0B96:0044 AD LODSW 0B96:0045 1B07 SBB AX,[BX] 0B96:0047 AB STOSW 0B96:0048 43 INC BX 0B96:0049 43 INC BX 0B96:004A E2F8 LOOP 0044 ;

186

; TESTING LSIG; SIGNIFICANT NUMBER OF WORDS OF [0] à CX; SIGN FLAG INDICATES ; THE LEADING BIT OF THE LEADING WORD 0B96:004C BE0000 MOV SI,0000 0B96:004F 8B0E0005 MOV CX,[0500] 0B96:0053 49 DEC CX 0B96:0054 03F1 ADD SI,CX 0B96:0056 03F1 ADD SI,CX 0B96:0058 83C102 ADD CX,+02 0B96:005B FD STD 0B96:005C AD LODSW 0B96:005D 0BC0 OR AX,AX 0B96:005F E1FB LOOPZ 005C 0B96:0061 FC CLD 0B96:0062 CD01 INT 01 0B96:0064 2AE4 SUB AH,AH -g 7 ;LOAD THE SEGMENT REGISTERSAX=0B45 BX=0000 CX=0574 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0007 NV UP EI PL NZ NA PO NC 0B96:0007 BE0000 MOV SI,0000 -d 0 500 ; the original data in the data arrays0B45:0000 02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0040 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0050 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................0B45:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0100 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0110 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0120 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0130 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0140 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0200 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0210 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0220 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0230 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0240 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0250 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

187

0B45:0260 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0270 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0280 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0290 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:02A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:02B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:02C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:02D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:02E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:02F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0300 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0320 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0330 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0340 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0350 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0360 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0370 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0380 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0390 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0400 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0440 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0450 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0460 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0470 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0500 80 ; the value of n .-g 14;COMPUTE END ADDRESS OF SOURCE1AX=0B45 BX=0000 CX=0080 DX=0000 SP=0000 BP=0000 SI=00FE DI=0000 DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0014 NV UP EI PL NZ AC PO NC 0B96:0014 BE0000 MOV SI,0000 -g 2a;TEST LAS FOR ADD; AX=0000 BX=0200 CX=0000 DX=0000 SP=0000 BP=0000 SI=0100 DI=0400 DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=002A NV UP EI PL NZ AC PE NC 0B96:002A BE0003 MOV SI,0300 -d 0 3ff; DATA [0] + [100] à [300]; DATA [0]0B45:0000 02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

188

0B45:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0040 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0050 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................0B45:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................;DATA [100]0B45:0100 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0110 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0120 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0130 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0140 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:01F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................;; DATA [300]0B45:0300 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0320 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0330 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0340 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0350 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0360 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0370 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................0B45:0380 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0390 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-g 36;DATA [300] à [400]AX=0000 BX=0200 CX=0000 DX=0000 SP=0000 BP=0000 SI=0400 DI=0500 DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0036 NV UP EI PL NZ AC PE NC 0B96:0036 BE0004 MOV SI,0400 -d 300 4ff0B45:0300 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0320 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0330 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

189

0B45:0340 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0350 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0360 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0370 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................0B45:0380 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0390 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:03F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0400 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0440 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0450 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0460 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0470 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-g 4c; TESTING LAS FOR SUBTRACTION;AX=0000 BX=0100 CX=0000 DX=0000 SP=0000 BP=0000 SI=0500 DI=0500 DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=004C NV UP EI PL NZ AC PE NC 0B96:004C BE0000 MOV SI,0000 -d; DATA [400] – DATA [0] à DATA [400]; DATA [400] BEFORE EXECUTION OF LAS0B45:0400 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0440 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0450 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0460 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0470 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

;DATA [0]0B45:0000 02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0040 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................

190

0B45:0050 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................0B45:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................; DATA [400] AFTER SUBTRACTION0B45:0400 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0410 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0420 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0430 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................0B45:0440 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0450 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0460 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0470 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................-g; VERIFICATION OF LSIG, THE PL FLAG SHOWS THE LEADING DIGIT OF THE MS WORD IS 0AX=0FFF BX=0100 CX=0040 DX=0000 SP=0000 BP=0000 SI=007C DI=0500 DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0064 NV UP EI PL NZ NA PE NC 0B96:0064 2AE4 SUB AH,AH -q

These macros, including ‘condh’ and ‘conhd’ that we discussed discussed in section 3 of this chapter, could be conveniently used to handle the large number programs we have been seeing. Sometimes we need to convert a little endian data to a big endian data, without changing its location. This is the same as array reversal in-situ program given under section 6 of Chapter 5. A slightly modified macro tailored to large number handling in terms of word arrays is given below, without serious comments, for your study. The macro assumes direction flag is clear, so the string operations are in the address incrementing mode.

data segment arr dw 1, 2, 3, 4, 5, 7 n dw ($ - arr)/2 data ends ;; code segment assume cs:code, ds:data, es: data ; ; now the macro revar is defined revar macro src, n local rev1, rev2 mov si, offset [src]

191

mov di, si add si, n add si, n jmp rev1 rev2: mov ax, [di] xchg ax, [si] stosw rev1: sub si, 2 cmp si, di ja rev2 endm ; Strt: mov ax, data mov ds, ax mov es, ax revar arr, n int 1 code ends end strt The test program and the macro are given without test results. The verification of the program is left to the reader.

9. A Table of first 16K prime numbers in Decimal: In section 7 of chapter 5, we saw how we can check if a given number is a prime, using a table of primes limited to the square root of the number being checked. We may start with a table which has just a couple of prime numbers, 2 and 3, and build the table and also use it as we proceed (satisfy yourself about this statement). Below, we give a program which builds its own prime number table, as it progresses.

All the computations are done in hexadecimal, and the first 16K hexadecimal prime numbers are found, the numbers are stored in terms of 4 byte words, and these words are then converted to decimal numbers using 4 byte hex to BCD conversion routine.The numbers fill the entire data segment and are stored in the big endian fashion in decimal. The largest number in the table is the 6-digit decimal prime number 180503. The entire entire table is not presented. Only a sample of the output at the start and at the end are seen. The complete list can be got by copying the program and running it in the DOS (DEBUG) environment after assembling. Exercise: Study the program in respect of the algorithm, register usage and optimizations done in the program.

A DECIMAL LIST OF FIRST 16K PRIME NUMBERSAND THE PROGRAM THAT PRODUCED THEM

data segment table dw 32768 dup(?)data ends;stak segment stack dw 256 dup(?) tos label wordstak ends

192

;code segment assume cs:code, ds: data, es:data, ss:stak start: mov ax, data mov ds, ax mov es, ax mov ax, stak mov ss, ax lea ax, tos mov sp, ax mov ax, 2 sub dx, dx mov cx, dx

cld sub di, di stosw ; first prime number 2, stored xchg ax,dx stosw ; stored as a double word xchg ax, dx inc ax ; next prime, 3. stosw xchg ax, dx stosw xchg ax,dx ; stored as a double word mov bx,ax ;cx:bx is the number to be checked if prime nextp: or di, di ;di is address for storing next prime jz finish nextp2: add bx, 2 ;try if the next odd number is a prime adc cx, 0 ; if carry, increment cx mov si,4 next: lodsw

cmp ax, 65535; what is this check? This ensures we do not get too large ; a number as the prime. Actually this is unnecessary.

jz finish mov bp, ax add si, 2 mul ax cmp cx, dx jz proc1 jnb procm ; if number-now is > bp*bp, then divide number-now by bp over: mov ax, bx stosw mov ax, cx ; yellow part of the program checks the next odd number stosw jmp nextp proc1: cmp bx, ax ; square is more than the number-now prime jb over ; the number is prime, store it jz nextp2 ; the number is a square, so not prime. procm: mov ax, bx mov dx, cx div bp or dx, dx jnz next jmp nextp2 finish: mov cx, 8 sub si, si mov di, si bak: lodsw mov dx, ax lodsw ;first 8 numbers converted to decimal stosw mov ax, dx cmp ax, 10

193

jb dwn add ax, 6 dwn: xchg ah, al stosw loop bak lup2: lodsw mov dx, ax lodsw xchg ax, dx mov bp, 10000 div bp cmp ax, 10 jb dn1 add ax, 6 dn1: xchg ah, al dn: stosw mov ax, dx mov dx, 100 ; remaining double hex words converted to BCD div dl mov cx, 4 xchg ah, ch aam ror al, cl ror ax, cl xchg al, ch aam rol al, cl rol ax, cl mov al, ch stosw cmp si, 0000 jnz lup2 ok: int 1 code ends end startTesting in the Debug Environment-u 0 b323F6:0000 B8D613 MOV AX,13D6 23F6:0003 8ED8 MOV DS,AX 23F6:0005 8EC0 MOV ES,AX 23F6:0007 B8D623 MOV AX,23D6 23F6:000A 8ED0 MOV SS,AX 23F6:000C 8D060002 LEA AX,[0200] 23F6:0010 8BE0 MOV SP,AX 23F6:0012 B80200 MOV AX,0002 23F6:0015 2BD2 SUB DX,DX 23F6:0017 8BCA MOV CX,DX 23F6:0019 2BFF SUB DI,DI 23F6:001B AB STOSW 23F6:001C 92 XCHG DX,AX 23F6:001D AB STOSW 23F6:001E 92 XCHG DX,AX 23F6:001F 40 INC AX 23F6:0020 AB STOSW 23F6:0021 92 XCHG DX,AX 23F6:0022 AB STOSW 23F6:0023 92 XCHG DX,AX 23F6:0024 8BD8 MOV BX,AX 23F6:0026 0BFF OR DI,DI 23F6:0028 7436 JZ 0060

194

23F6:002A 83C302 ADD BX,+02 23F6:002D 83D100 ADC CX,+00 23F6:0030 BE0400 MOV SI,0004 23F6:0033 AD LODSW 23F6:0034 3D0100 CMP AX,0001 23F6:0037 7427 JZ 0060 23F6:0039 8BE8 MOV BP,AX 23F6:003B 83C602 ADD SI,+02 23F6:003E F7E0 MUL AX 23F6:0040 3BCA CMP CX,DX 23F6:0042 740A JZ 004E 23F6:0044 730E JNB 0054 23F6:0046 8BC3 MOV AX,BX 23F6:0048 AB STOSW 23F6:0049 8BC1 MOV AX,CX 23F6:004B AB STOSW 23F6:004C EBD8 JMP 0026 23F6:004E 3BD8 CMP BX,AX 23F6:0050 72F4 JB 0046 23F6:0052 74D6 JZ 002A 23F6:0054 8BC3 MOV AX,BX 23F6:0056 8BD1 MOV DX,CX 23F6:0058 F7F5 DIV BP 23F6:005A 0BD2 OR DX,DX 23F6:005C 75D5 JNZ 0033 23F6:005E EBCA JMP 002A 23F6:0060 B90800 MOV CX,0008 23F6:0063 2BF6 SUB SI,SI 23F6:0065 8BFE MOV DI,SI 23F6:0067 AD LODSW 23F6:0068 8BD0 MOV DX,AX 23F6:006A AD LODSW 23F6:006B AB STOSW 23F6:006C 8BC2 MOV AX,DX 23F6:006E 3D0A00 CMP AX,000A 23F6:0071 7203 JB 0076 23F6:0073 050600 ADD AX,0006 23F6:0076 86E0 XCHG AH,AL 23F6:0078 AB STOSW 23F6:0079 E2EC LOOP 0067 23F6:007B AD LODSW 23F6:007C 8BD0 MOV DX,AX 23F6:007E AD LODSW 23F6:007F 92 XCHG DX,AX 23F6:0080 BD1027 MOV BP,2710 23F6:0083 F7F5 DIV BP 23F6:0085 3D0A00 CMP AX,000A 23F6:0088 7203 JB 008D 23F6:008A 050600 ADD AX,0006 23F6:008D 86E0 XCHG AH,AL 23F6:008F AB STOSW 23F6:0090 8BC2 MOV AX,DX 23F6:0092 BA6400 MOV DX,0064 23F6:0095 F6F2 DIV DL 23F6:0097 B90400 MOV CX,0004 23F6:009A 86E5 XCHG AH,CH 23F6:009C D40A AAM 23F6:009E D2C8 ROR AL,CL 23F6:00A0 D3C8 ROR AX,CL 23F6:00A2 86C5 XCHG AL,CH 23F6:00A4 D40A AAM 23F6:00A6 D2C0 ROL AL,CL 23F6:00A8 D3C0 ROL AX,CL

195

23F6:00AA 8AC5 MOV AL,CH 23F6:00AC AB STOSW 23F6:00AD 83FE00 CMP SI,+00 23F6:00B0 75C9 JNZ 007B 23F6:00B2 CD01 INT 01 ; 92 instructions and 179 memory locations used. -gAX=0305 BX=C117 CX=0504 DX=0064 SP=0200 BP=2710 SI=0000 DI=0000 DS=13D6 ES=13D6 SS=23D6 CS=23F6 IP=00B4 NV UP EI PL ZR NA PE NC 23F6:00B4 0000 ADD [BX+SI],AL DS:C117=71-d 0 FF ; first 64 prime numbers in decimal13D6:0000 00 00 00 02 00 00 00 03-00 00 00 05 00 00 00 07 ................13D6:0010 00 00 00 11 00 00 00 13-00 00 00 17 00 00 00 19 ................13D6:0020 00 00 00 23 00 00 00 29-00 00 00 31 00 00 00 37 ...#...)...1...713D6:0030 00 00 00 41 00 00 00 43-00 00 00 47 00 00 00 53 ...A...C...G...S13D6:0040 00 00 00 59 00 00 00 61-00 00 00 67 00 00 00 71 ...Y...a...g...q13D6:0050 00 00 00 73 00 00 00 79-00 00 00 83 00 00 00 89 ...s...y........13D6:0060 00 00 00 97 00 00 01 01-00 00 01 03 00 00 01 07 ................13D6:0070 00 00 01 09 00 00 01 13-00 00 01 27 00 00 01 31 ...........'...113D6:0080 00 00 01 37 00 00 01 39-00 00 01 49 00 00 01 51 ...7...9...I...Q13D6:0090 00 00 01 57 00 00 01 63-00 00 01 67 00 00 01 73 ...W...c...g...s13D6:00A0 00 00 01 79 00 00 01 81-00 00 01 91 00 00 01 93 ...y............13D6:00B0 00 00 01 97 00 00 01 99-00 00 02 11 00 00 02 23 ...............#13D6:00C0 00 00 02 27 00 00 02 29-00 00 02 33 00 00 02 39 ...'...)...3...913D6:00D0 00 00 02 41 00 00 02 51-00 00 02 57 00 00 02 63 ...A...Q...W...c13D6:00E0 00 00 02 69 00 00 02 71-00 00 02 77 00 00 02 81 ...i...q...w....13D6:00F0 00 00 02 83 00 00 02 93-00 00 03 07 00 00 03 11 ................-d FF00 FFFF ; last 64 of the 16K Prime numbers in decimal13D6:FF00 00 17 98 07 00 17 98 13-00 17 98 19 00 17 98 21 ...............!13D6:FF10 00 17 98 27 00 17 98 33-00 17 98 49 00 17 98 97 ...'...3...I....13D6:FF20 00 17 98 99 00 17 99 03-00 17 99 09 00 17 99 17 ................13D6:FF30 00 17 99 23 00 17 99 39-00 17 99 47 00 17 99 51 ...#...9...G...Q13D6:FF40 00 17 99 53 00 17 99 57-00 17 99 69 00 17 99 81 ...S...W...i....13D6:FF50 00 17 99 89 00 17 99 99-00 18 00 01 00 18 00 07 ................13D6:FF60 00 18 00 23 00 18 00 43-00 18 00 53 00 18 00 71 ...#...C...S...q13D6:FF70 00 18 00 73 00 18 00 77-00 18 00 97 00 18 01 37 ...s...w.......713D6:FF80 00 18 01 61 00 18 01 79-00 18 01 81 00 18 02 11 ...a...y........13D6:FF90 00 18 02 21 00 18 02 33-00 18 02 39 00 18 02 41 ...!...3...9...A13D6:FFA0 00 18 02 47 00 18 02 59-00 18 02 63 00 18 02 81 ...G...Y...c....13D6:FFB0 00 18 02 87 00 18 02 89-00 18 03 07 00 18 03 11 ................13D6:FFC0 00 18 03 17 00 18 03 31-00 18 03 37 00 18 03 47 .......1...7...G13D6:FFD0 00 18 03 61 00 18 03 71-00 18 03 79 00 18 03 91 ...a...q...y....13D6:FFE0 00 18 04 13 00 18 04 19-00 18 04 37 00 18 04 63 ...........7...c13D6:FFF0 00 18 04 73 00 18 04 91-00 18 04 97 00 18 05 03 ...s............-q

Only the first and the last 256 bytes of displayed results (64 numbers in each block of 256 bytes) are shown above. The last prime number indicated in the table is the decimal number 180503. The result occupies the entire data segment. Note the separate stack segment used in this program.

Conclusion: In this chapter we have seen how large integer numbers could be handled in 8086 using only the assembly language programming without using any serious tools. These programs illustrate the capability of the processor hardware and its instruction set. In the previous chapters we have learnt about the processor

196

register set, instruction set architecture and looked at some simple programs. In this last chapter, we have seen how those simple programs could be used to wrk out more complex number handling routines. When we get into designing big programs, the basic principles we have learnt in simpler programs of the earlier chapter are still useful, but the complex programs do require careful management of the resources available and proper tracking of the algorithm that we are employing. Essentially, it is all about balancing the available resources against the requirements of our algorithm, and this can greatly be helped by making adequate coments so that the program can be easily understood and debugged without much difficulty.

==00==

EXERCISES

1. Modify any one of the large number handling programs given in this Chapter so that it uses the macros given at the end in section 8.

2. Write a program to invert a 20 byte number using the method of section 6.


197

APPENDIX A

In this book I have indicated the working and the results produced while solving a .exe program in the debug environment. It is, as you have seen quite useful to produce a permanent copy of the working with results for purposes of later use and demonstration. In this appendix I shall explain the method that I have used for obtaining an MSWord file indicating the working and results from the debug. The method is indicated below:

In the DOS environment invoke the debug with the parameter – [filename.exe]. If this is followed by pressing of the <enter> key, the debug will work as usual. If instead, the parameter is followed by – [>filename.dem], the output from the debug will go to the .dem file. However, the response of the debug will not come on the screen; it will directly go into the .dem file. (In Unix OS, there is a Tee operation permitted to make the result go to the screen which is the standard output device and also to a named file. If you are using a Unix system, it can be quite convenient. It is not so straight to do this in the windows system, though there are tricks to overcome this deficiency. But I am sure there is such a facility available in windows also. The information in the following site (see under the FAQ index number 94) http://www.netikka.net/tsneti/info/tscmd.htm you will find some tricks to do this job in the windows which may be useful. I have not used any of these methods, but I have operated (blindly) in the debug environment by redirecting the result directly to a .dem file without getting any visual feedback on the screen. The .dem file can then be copied into an MSWord file and manipulated like a .doc file. As a .dem file it cannot be easily manipulated as the output file created is not a regular MSWord file; it is something like a notepad file.

For example, if we want to test the program style1.exe, the command sequence could be as follows (refer Chapter 3, programming style 1):

From the DOS screen, give the command “debug style1.exe > style1.dem”. Press the <enter> key as usual after the command. You will notice the debug prompt “-“. But from then on, nothing of the debug responses will be seen on the screen. These responses will directly go to the style1.dem file. The debug commands will have to be properly given as required without any visible feedback on the responses of the debug. In case of the demo on style1, the following sequences of commands were given.

u 0 30 <enter>rax <enter>ffef <enter>r <enter>t16 <enter>q <enter>

These command sequences have to be worked out initially in the debug mode for the required operations. Once the .dem file is obtained, it can conveniently be copied into an MSWord file and edited wherever necessary. In this way, almost hands-on type of feature can be had for studying assembly language programs, with a hard copy of the working of the different programs.


197

http://www.netikka.net/tsneti/info/tscmd.htm

Documents

Assembly Language Using MASM Basics