Floating Point Arithmatic

There are two possible methods that can be used for representing binary numbers in A computer. They are the fixed point and floating point systems. In practice, in a fixed Point system, binary numbers are expressed as fractions with the radix point positioned Immediately right of the sign digit. 


For e.g. in a machine using 8-bit registers 1110.0 would be represented as
0.1110000 -1110 x 2 -4 by moving the radix point four places to the left. Unfortunately there are problems associated with fixed point arithmetic. that if the sum of two 8-bit numbers is > 127 or < -127 an additional bit is generated and an incorrect answer is obtained. Assuming 8-bit registers are being used in the machine, the range of the registers has been exceeded. The same problem exists for the multiplication and division operations. 


If two 8-bit numbers are multiplied, one by the other, then in many cases a double- length product will be formed and this would require a 16-bit register. Similarly, for the division operations, a fractional quotient can only be formed if the divisor is greater than the dividend. 


To overcome the range problems experienced with fixed point representation a floating point system can be used. Numbers in this system are expressed in the following form: n=m x 2e Where m, the mantissa, is the fractional representation of n and e is the exponent. 



When performing a computation, a normalized form of the mantissa is used. Normalization is achieved by adjusting the exponent so that the mantissa has a 1 in its most significant digit position. When this condition is satisfied: 0.5 <= m <>


+1492.9187= +.14929187 x 10

+4 -.00034123 = -.34123000 x 10 -3


Which have been normalized an alternative way of expressing these numbers would be
 
+. 14929187 x 10
+4 =- +. 14929187e
+ 4 -.34123000 x 10
-3 = -.34123000e - 3


Assuming that the bias constant to be added to the exponent is 16 and that the exponent part of the numbers is positioned to the left of the fractional part, the two numbers would have the following form: 


14929187 x 10+4 =-+20,
14929187 -.34123000 x 10
-3 =-13, 34123000


The addition of the constant 16 to the exponent expresses in two decimal digits any exponent between 10 +15 and 10 -16 and consequently increases the range of numbers the machine can handle.