The common application paper: Booth Multiplier

Low Power carrell Multiplier by Effective Capacitance Minimization P. Nageshwar Reddy Dr. Damu Radhakrishnan Stu. in SUNY, New Paltz, NY Prof. in SUNY, New Paltz, NY Abstract In this paper we present an energy efficient parallel multiplier factor factor trope based on intentful content minimization. all the overtone product decrease defend in the multiplier is considered in our research. The sound electrical capacity is the product of capacitance and sack military action. Hence to minimize the effective capacitance in our visualizeing, we decided to ensure that the shift key activity of nodes with high capacitances is kept to a minimum.This is achieved in our design by wiring the higher switching activity sign ups to nodes with lower capacitance and vice versa for the 42 compressor and luxuriant common viper cells, as spunking the initial prospect of each partial product arcminute as 0. 25. This bring down the overall switching capacitance, thereby reducing th e total great role custom in the multiplier. Power psychoanalysis is d wiz by synthesizing our design on Spartan-3E FPGA and employ XPower Analyzer tool that is provided in ISE Xilinx 10. 1. The dynamic proponent for our 16? 16 multiplier was measured as 360. 4mW, and the total power 443. 31mW. This is 17. 4% less comp bed to the most recent design. Also we noticed that our design has the lowest power-clutch product compared to the multiplier presented in the literature. Index Terms- carrel multiplier, Effective capacitance, 42 compressor. 1. Introduction A multiplier is the most frequently used fundamental arithmetic unit in various digital systems such as computers, member controllers and house processors. Thus it has become a major source of power dissipation in these digital systems.With the exp iodinential growth of portable systems that are operated on batteries, power decrement has become whizz of the primary design constraints in recent years. In the present era , each and every electronic device is use use CMOS technology. The 3 major sources of power dissipation in digital CMOS circuits are dynamic, short circuit and leakage 1. Generally, power reducing proficiencys aim at minimizing all the above menti adeptd power dissipation sources but our emphasis is on dynamic power dissipation as it dominates other power dissipation sources in digitalCMOS circuits. The switching or dynamic power dissipation occurs due to the charging and discharging of capacitors at different nodes in a circuit 2. The average dynamic power consumption of a digital circuit with N nodes is given by where VDD is the supply voltage, Ci is the load capacitance at node i, fCLK is the clock frequency and ? i is the switching activity at node i. The product of switching activity and load capacitance at a node is called effective capacitance.Assuming only one system of system of logical system change per clock cycle, the switching activity at a node i can be defined as the probability that the logic value at the node changes (0->1 or 1->0) between devil consecutive clock cycles. For a given logic element, the switching activity at its output(s) can be computed utilise the probability of its excitants and is given by where and denote the probability of occurrence of a one and zero at node i respectively. When Pi = 0. 5, the switching activity at a node is maximum and it decreases as it goes towards the ii extreme values (i. e. both from 0. to 0 and 0. 5 to 1). The two main low power design strategies for dynamic power reduction are based on (i) supply voltage reduction and (ii) the effective capacitance minimization. The reduction of supply voltage is one of the most aggressive techniques because the power savings are significant due to the quadratic dependence on VDD. Although such reduction is usually very effective, it increases leakage current in the transistors and as well as decreases circuit speed. The minimization of effective swi tching capacitance involves reducing switching activity or node capacitance.The node capacitance depends on the integration technology used. To reduce switching activity only requires a detailed analysis of signal transit probabilities, and implementation of various circuit level design techniques, such as logic synthesis optimization and balanced paths. It is independent of the technology used and is less expensive. Admiring the advantages of switching activity reduction, this paper focuses on switching activity reduction techniques in a multiplier. Digital Multiplication is make in three steps in a Booth coded multiplier.The graduation exercise step is to generate all the partial products in parallel using Booth recoding. In the arc moment step these partial products are reduced to 2 operands in several wooden legs by applying Wallace/Dadda rules. These stagecoachs follow one after the other, reaching the output of one stage to the next. The final step is adding the two ope rands using a carry propagate common viper to produce the final sum. Our main focus in this paper is the second step, partial product reduction. Fig. 1 shows the modified Dadda reduction tree for a 6? 6 unsigned multiplier, which uses spacious adders (FA) and half adders (HA) as basal elements.Stage 1 is the rearranged 6? 6 unsigned partial product array obtained using the partial product generator. At every partial product reduction(PPR) stage the cast of bits with the similar order (bits in a column) are assemblyed together and connected to adder cells following Daddas rules. Each column exhibits partial products of a certain order. The sum output of a FA or HA at one stage impart place a dot in the like column at the next stage and an output carry in the column to the left in the next stage (i. e. one order of magnitude higher). Fig. 1. Modified Dadda reduction tree for 6? unsigned multiplication The Wallace and Dadda designs use only FAs and HAs in the reduction stages , which form an irregular layout and increases wiring complexness. Wiring complexity is a measure of power. Since indeed Weinberger 3 has proposed a 42 compressor, the majority of the multiplier designs today soak up use of 42 compressors to increase the carrying out of the multiplier. They also contribute to power reduction as they decrease the wiring capacitance due to a more regular layout, contributing to fewer transitions in the partial product reduction tree. It also reduces hardware cost.The design of the 42 compressor got impoved in time, and modified design presented by Jiang et al. claimed improvements in both check up on and power dissipation compared to earlier designs 4. Several logic and circuit level optimizations are possible by using higher order compressors instead of honest FA cells for reducing the number of transitions in the partial product reduction stage. Because of this we used 42 compressors, FA (32 compressor) and HA cells in our partial product reduc tion stages. We reduced the switching activity by minimizing the effective capacitance at every node in the circuit.This stands as the main focus of this paper. This paper is organised as follows related research in section 2 and 2. Related Research M either researchers turn over elucidated different low power multiplier architectures by using different techniques to reduce the total switching activity in a multiplier - . Ohban, et al. proposed a low power multiplier using the so called bypassing technique 5. The main judgment of their approach is to minimize the signal transitions while adding zero valued partial products. This is done by bypassing the adder stage whenever the multiplier bit is zero.Masayuki, et al. proposed an algorithm using operand decomposition technique 6. They decomposed the multiplicand and the multiplier into 4 operands and using them they generated twice the number of partial products compared to the conventional multiplier. By doing this, they reduced the one probability of each partial product bit to 1/8 while it is 1/4 in the conventional multipliers. This in turn decreases the switching probability. Chen, et al. proposed a multiplier based on effective dynamic range of the input data 7.If the data with littler effective dynamic range is Booth coded then the partial products have greater chances to be zero, which decreases the switching activities of partial products. Fujino, et al. proposed a multiply accumulate design using dynamic operand transformation technique in which current values of the input is compared with previous values 8. If more than half of the bits in an operand change then it is dynamically transformed to its twos complement in order to decrease the transition activity during multiplication. Chen, et al. roposed a low power multiplier, which uses mean power suppression technique (SPST) equipped Booth encoder 9. The SPST uses a detection logic circuit to detect whether the Booth encoder is calculating redun dant computations which yield in Zero partial product and stops such PP generation process. To implement the basic principles used in all the above mentioned multiplier architectures not only increase hardware intensity but also introduce delay in the operation. Also the extra circuitry employed to implement them consumes power.So our research interest is focused on techniques which decrease power without introducing either delay and additional hardware. Oskuii et al. proposed an algorithm based on static probabilities at the primary inputs 10. At every PP reduction stage the number of bits with the alike(p) order of magnitude (bits in a column) are grouped together and connected to the adder cells in a Dadda tree. The selection of these bits and their grouping influences the overall switching activity of the multiplier. This was expoundd in Oskuiis paper by referring to an early work, which is described below. Only one column per stage is considered here. As the generated carry bits from adders propagate from LSB towards MSB, optimization of columns is performed from LSB to MSB and from first stage to last stage. Thus it can be ensured that the optimization of columns and stages that has already been performed exit still be valid when later optimizations are being performed. * Glitches and spurious transitions spread in the reduction stage after a few layers of combinational logic. To avoid them is not feasible in most cases. Therefore it seems near to assign short paths to partial products having high switching activity.Oskuiis goal was to reduce the power in Dadda trees. The one probability for sum and carry of the FA and HA can be calculated from their functional behavior 10. According to Oskuiis algorithm, assuming the switching probabilities of partial products in a particular stage are calculated using the previous stage one probabilities and in each column and they arranged these partial product bits in ascending order. They first use the lower sw itching probability bits to feed full and half adders and transfer the higher switching probability bits to the next stage.From the set of bits to feed adders they tried to feed the highest switching probability signal to the carry input of the full adder as its path in full- adder is shorter than the other two inputs. Fig. 2. Example to illustrate Oskuiis approach 10 Fig. 2 gives an example where 7 bits with the same order of magnitude are to be added. This is shown as the shaded box in the 2nd group of bits from top in Fig. 2. According to Dadda rules of reducing a partial product tree, 2 FAs must be used and one bit will be passed to the next stage together with the sum and carry bits generated by the full adders. s for i varying from 1 to 7 represent the switching probabilities of the sevensome bits. These are sorted in ascending order and listed as ? i* with the highest one as ? 1*. According to their approach, the bit with highest switching activity is kept for the next stage i. e. in Fig. 3. 2, and assign and to the carry inputs of the two FAs as their path is shorter and the other bits to the remain inputs of FAs in any order. In this way they reduced the partial product tree by bringing the highest transition probability bits more closer to the output such that it reduces the total power in the multiplier without any extra hardware cost.Oskuii claimed that power reduction varying from 4% to 17% in multiplier designs could be achieved using their approach. On careful analysis of Oskuiis work we notice that further reduction in power can be achieved. This is elaborated in our design presented in the next section. 3. Proposed subject By using a partial product generator (PPG) for the n? n multiplier employing radix-4 Booth encoder we obtained the required partial products. These partial products are then reduced to 2 operands employing several partial product reduction (PPR) stages. We used a combination of 42 compressors, FAs and HAs in reduction sta ges.At each stage modified Dadda rules are applied to obtain operands for the next stage. While minimizing the partial product bits in each column using 42/32 compressors and HA cells, emphasis was given on higher speed and lower power. Higher speed is achieved by allowing the partial product bits to pass done a minimum number of reduction stages, while minimizing the final carry propagate adder length to the minimum. Fig. 3. Proposed PPG scheme for a 16? 16 multiplier Fig. 3 shows the proposed partial product reduction scheme for a 16? 16 parallel multiplier.Nine partial products obtained by PPG are reduced to 2 operands using 3 reduction stages. The steep green boxes in each column represent 42 compressors. It takes five bits and reduces them into 3 output bits, one sum bit in the same column position and two carry bits in the next higher significant column (one bit left) of next stage. The tumid red boxes represent full adder cells, which reduce three partial product bits in a column and generate the sum and carry bits. Similarly, the vertical blue boxes represent half adders and add two partial product bits to reduce it to 2 output bits.The order in which the inputs are fed to 42 compressor, full and half adders is discussed in the next section. In Fig. 3 the maximum number of partial products in a column is 8 (columns 14 to 17). Since we are using 42 compressors that can take up to 5 input bits, to reduce the partial products in the first stage, we want to make sure that the maximum number of partial products in the next stage is only 5. This way we can reduce the bits in each column in stage 2 using one level of 42 compressors. And in the third stage, we want to ensure that the maximum number of bits in any column is only 3, so that full adders can be used to add them.This will permit the whole reduction process to be achieved in 3 stages. The half adder in column 2 in reduction stage 1 and the full adder in column 3 in reduction stage 2 are used so a s to minimize the size of the final carry propagate adder. 4. Power decline Once the minimum number of reduction stages is established for a design, the next criterion is to minimize power consumption. This is achieved by delay passing and reducing the effective capacitance at every node in the reduction stages also following Oskuiis rules (discussed in scratch 2).To minimize the effective switching activity, the design must ensure that the switching activity of nodes with higher capacitance value must be kept to a minimum. This is achieved by a special interconnection pattern used in our design. The higher switching activity signals are wired to nodes with lower capacitance and vice versa. Our multiplier design uses the above idea to minimize power. This paper therefore focuses on selective interconnection of signals to the inputs of 42 compressors and FAs and HAs using the above concept.The logic diagram and the input capacitances for a full adder are shown in Fig. 4(a). For th e following we will assume that each and every input lead to a logic gate is considered as one unit load (C1). Hence if a signal is connected to the inputs of two logic gates, then the load is two units (C2). From the logic diagram of the full adder in Fig. 4(a), input B is connected only to an XOR gate, where as inputs A and C are connected to both an XOR and a Mux. Hence, the input capacitance of the B-input is smaller than the other two inputs.The load presented by the B input is one unit load, while the loads presented by A and C are 2 unit loads. Hence a transition on input B will result in less effective capacitance. This is represented by the capacitance values C1 (1 unit load) and C2 (2 unit loads) as shown in Fig. 4. 9. Again by comparing the three inputs, the C input goes through only one logic device (XOR gate or Mux) before it reaches the output, where as both A and B goes through two logic devices before reaching the output. Hence, a transition on any of the inputs A or B could result in output transitions on all the three logic devices.But a transition on input C will affect only two of these logic devices. Therefore we can conclude that even though the inputs A and C represent the same load, the overall switching effect on the full adder due to C input will be less than that due to A input. Hence, as a rule of thumb, the first two higher transition inputs among a set of three inputs that are given to a full adder should be connected to the B and C inputs and the last one to A. (a) (b) Fig. 4. a) FA logic diagram and input capacitances (b) 42 compressor logic diagram and input capacitances Similarly, the logic diagram of a 42 compressor and its input capacitances are shown in Fig. 4. (b). The input capacitances presented by X1, X3, X4 and Cin are twice that presented by X2. Hence, the highest transition probability signal must be connected to the X2 input. Again by using a similar argument as in the full adder, the second highest transition proba bility signal must be given to the Cin. The remaining inputs are given to X1, X3 and X4 in any order. This minimizes the overall effective capacitance in a 42 compressor.The probability of a logic one at the output of any block is a function of the probability of a logic one at its inputs 11 12. From the logic functions of 42 compressor, FA and HA we can calculate their output probabilities knowing their input probabilities. Table 2 Probability equations for 42 Compressor 42 Compressor PSUM PCout PC0 Table 1 shows the probability expression for the sum and carry outputs for the full adder and half adder in terms of their input signal probabilities. The 42 compressor output probabilities are shown in Table 2. By comparingTables 1 and 2 we can say that the statistical probabilities of the output signals of basic elements (42 compressors, full adders and half adders) used in partial product reduction stages vary. Table 3 shows the output signal probabilities of 42 compressor, full adder and half adder, assuming equal 1 probabilities of 0. 25 for all inputs. In each partial product reduction stage the signals in a particular column have different switching probabilities. The output signals of one stage become inputs to the next stage. So the switching probabilities of the outputs diverge more as we move down the partial production reduction stages.Table 3. 1 Output Signal Probabilities of FAs and HAs Full-adder Half adder SUM CARRY A. B PSUM PCARRY Table 3 Output probabilities of 42 compressor and adder cells Input signal probabilities = 0. 25 42 compressor Full adder Half adder SumCoutC0 0. 48440. 15630. 2266 Sum channelize 0. 43750. 1563 SumCarry 0. 3750. 0625 Several reduction stages are required to reduce the partial products generated in a parallel multiplier. As shown in Fig. 3, at each stage a number of bits with the same order of magnitude are grouped together and connected to the 42 compressors and adder cells.The selection of these bits and t heir grouping influences the overall switching activity of the multiplier. This is what we will exploit to reduce the overall switching activity of the multiplier. Fig. 5 shows the array structure of the proposed partial product reduction scheme for a 16? 16 multiplier. In the following we assumed that the one probability of all the 9 partial product bits are same and is equal to 0. 25 (as discussed in Section 3. 26). These 9 partial product bits are fed to 42 compressors, full and half adders and are reduced to 5 operands. The bits in these 5 operands will have different one probabilities.From these one probabilities we can calculate their switching probability. If we look at each column all the bits in that column have the same weight but different one probability. So we have enough freedom to choose any of these signals which can be connected to any of the inputs of the basic elements. The way these signals are wired to basic elements to achieve reduction will affect the total po wer consumption in a multiplier. take an example Fig 5 shows how we wired the input signals to 42 compressors and full adders in the proposed design. To illustrate the principle consider column 16 of reduction stage 2 in Fig. , where we have five bits with the same order of magnitude, which are to be wired to the inputs of a 42 compressor. The first higher transition bit is fed to X2 input and next higher transition bit is fed to Cin, as they provide lower switching activity when compared to others. The remaining three bits can be fed to X1, X3 and X4 in any order. Similarly on column 11 in reduction stage 3, three bits of the same order are to be added. The highest transition bit is given to B input of the adder and the next higher transition bit is fed to C input. The third bit is fed to A input.This way of feeding the inputs, we can decrease the output switching probabilities of compressors and adders. By applying the same technique to every stage we can reduce the overall switc hing capacitance of the multiplier, thereby reducing power. Fig. 5. Wiring patterns for 42 compressors and full adders 5. Simulation Power analysis was done by synthesizing our 16? 16 multiplier design on Spartan-3E FPGA and using XPower Analyzer tool provided in ISE Xilinx 10. 1. We evaluated the performance of our 16? 16 multiplier by comparing with the conventional Wallace and Oskuiis multipliers.Table 4 shows the quiescent and dynamic powers of different multipliers obtained by simulation. The quiescent power is almost the same for all multipliers. The dynamic power for our design is only 360. 74 mW, where as Oskuiis and Wallace multipliers consume 454. 06mW and 475. 08 mW respectively. Hence the total power consumption is only 443. 31mW for our multiplier, which is less by 17. 39% and 20. 51%, compared to Oskuiis and Wallace multipliers. Table 4 Power reports from simulation for a 16? 16 Multiplier soma QuiescentPower (mW) DynamicPower (mW) TotalPower (mW) Our Design 82. 7 360 . 74 443. 31 Oskuiis Design 82. 57 454. 06 536. 63 WallaceMultiplier 82. 67 475. 08 557. 75 Table 5 Power-Delay products of 16? 16 multipliers Design Total Delay (ns) Power (mW) Power-Delay Product Our Design 30. 889 443. 31 13. 693*10-9 Oskuiis Design 31. 219 536. 63 16. 753*10-9 WallaceMultiplier 35. 278 557. 05 19. 651*10-9 Table 5 shows the power-delay products of different multipliers. Smaller the power delay product of a multiplier the higher is its performance. Our design has the shortest delay of 30. 889ns, compared to 31. 219ns and 35. 78ns for Oskuiis design and Wallaces design respectively. Hence our design has the lowest power-delay product compared to both Oskuiis and Wallace multipliers. 6. Conclusions We have presented an investigation of multiplier power dissipation, along with some techniques which allow reductions in power consumption for this circuit. Given the importance of multipliers, it is essential that further research efforts are to be directed in the follo wing ways. * In this thesis the switching activity criteria for the interconnection pattern in 42 compressors was used only for two of the inputs of the 42 compressor.The interconnections of signals on the other three inputs are make without any importance given to their switching activity. This is because at the gate level, the load capacitance at a node is measured simply based on the number of connections made at that node. In the 42 compressor, three of the inputs are feeding two inputs each (except the carry input). Hence, we consider them with the same load capacitance. In reality, this is not true. To get an accurate estimate on capacitance, an actual layout of the cell has to be made using VLSI layout tools and then their capacitances are to be extracted.Hence further research could focus on the above so as to find an ordering for these inputs based on their capacitance values. Also, different implementations of 42 compressors may be compared so as to select the one with the lowest capacitance values. * Extending the proposed interconnection technique to the partial product reduction stage by employing higher order compressors such as 52, 92, 282, etc. In this manner, different architectures using various combinations of compressors in the partial product reduction stage can be compared so as to select the best one with the lowest power dissipation for any multiplier.References 1 D. Soudris, C. Piguet, and C. Goutiset , Designing CMOS Circuits for Low Power. Kluwer Academic Press, 2002. 2 L. Benini, G. D. Micheli, et al. , Dynamic Power Management Design Techniques & CAD Tools. Norwell, MA Kluwer Academic Publishers, 1998. 3 A. Weinberger, 42 Carry Save Adder Module, IBM Technical Disclosure Bulletin, vol. 23, 1981. 4 S. F. Hsiao, M. R. Jiang, and J. S. Yeh, Design of High-Speed Low-Power 3-2 Counter and 4-2 Compressor for Fast Multipliers, Electronics Let. , vol. 34, no. 4, pp. 341-342, 1998. 5 J. Ohban, Multiplier Energy Reduction Through Bypassing o f Partial Products in Proc. Asia-Pacific Conf. on Circuits and Systems, vol. 2, pp. 1317, 2002. 6 M. Ito, D. Chinnery, and K. Keutzer, Low Power Multiplication Algorithm for replacement Activity Reduction Through Operand Decomposition, 21st Int. Conf. on Computer Design, 2003. 7 O. T. Chen, S. Wang, and Yi-Wen Wu, Minimization of Switching Activities of Partial Products for Designing Low-Power Multipliers, IEEE Trans. on VLSI Syst. , vol. 11, pp. 418 433, 2003. 8 M. Fujino, and V. G. Moshnyaga, Dynamic Operand Transformation for Low-Power Multiplier-Accumulator Design, in Proc. of the Int. symp. n circuits and systems, 2003. 9 K. H. Chen and Y. S. Chu, A Low Power Multiplier with Spurious Power inhibition Technique, IEEE Trans. VLSI Syst. , vol. 15, no. 7, pp. 846-850, 2007. 10 S. T. Oskuii, Transition-Activity Aware Design of Reduction-Stages for Parallel Multipliers, in Proc. of Great Lakes Symp. on VLSI, 2007. 11 K. Parker and E. J. McCluskey, Probabilistic Treatment of Genera l Combinational Networks, IEEE Trans. on Computers, C-24 668-670, June 1975. 12 M. Cirit, Estimating Dynamic Power purpose of CMOS Circuits in Proc. of ICCAD, pp. 534537, 1987.

The common application paper

Sunday, May 26, 2019

Booth Multiplier

No comments:

Post a Comment