Baugh-Wooley Multiplier

May 2010


Baugh-Wooley Multiplier


This project for ESE-570 (VLSI) resulted in a 4-bit multiplier able to handle signed multiplication.  It was constructed and tested in the Cadence software environment. The multiplier is implemented using the Baugh-Wooley algorithm, which is designed to use very few logical operations in each step of the multiplication operation and therefore does not suffer large cumulative gate delays. In addition, the algorithm uses common logical units to simplify design and deals with signed inputs more efficiently than other algorithms. This produces a relatively efficient calculation time, while maintaining low-cost production costs. Although the algorithm is not able to work with unsigned inputs, the goal of the project was a signed multiplier.


0.6um CMOS technology
Input: Two 4-bit signed (two’s compliment) binary operands
Output: One 8-bit signed (two’s compliment) binary product


1,158 Transistors:

  • 15 Full Adders: 2 NOT, 3 AND, 1 OR (6 NOT, 3 NAND, 1 OR)
  • 16 AND gates:  16 NOT, 16 NAND
  • 8 NOT gates


The Baugh Wooley multiplier is an interesting implementation of a multiplier. It produces a relatively standard cell shape for easier manufacturing, while maintaining good driving caharacteristics. Each full adder within the multiplier performs a similar number of computations, with the diagonal ripple carry propagating the input signal a constant and similar number of times.

An issue encountered during the design and layout of the multiplier involved the layout of the required full adders.  A single full adder circuit naturally lays out in a very wide (or tall) chip, which creates problems when working toward smallest form factor and efficiency of cost. We attempted a number of creative solutions to work around this, including stacking these chips to create a more square form. This, however, produced long connections as the signal wrapped around from the output of one full adder to the input of another and introduced both losses and capacitance.  It was determined that a better solution was to implement the multiplier in a single, linear direction through four stages of full adders, but only have three rows of full adders to add width to the layout. This resulted in a more rectangular and flat circuit, but allowed us to reduce drastically the distance the signal had to travel between chips, and decrease the overall complexity of connections within the circuit.

This solution also resulted in a chip that could be easily located near the edge of an ALU, or in general easily function as an auxiliary piece of circuitry in a larger chip. A highly regular shape (i.e. a rectangle) can be easily built around and incorporated into 3rd party projects. Making sure that the contents within the chip were densely packed, this chip can easily be added to more complex circuits while wasting minimum die space for the 3rd party.

When initially designing our chip we simulated a schematic and observed the multiplier’s response based on changes in input. The performance goals that we would like the multiplier to meet were determined by observing the switching characteristics of the theoretical multiplier (using a 25 f Farad load capacitance).  These values were set as the baselines for the switching behavior of the circuit.  As shown in the below table, the performance goals were met and exceeded.

Performance Goal Performance Achieved
Rise Time 246 pS 187 pS
Fall Time 327 pS 315 pS

In conclusion, a Baugh Wooley multiplier was successfully constructed using the Cadence Virtuoso layout environment. The multiplier was created in hierarchical fashion, and a total of 1,158 active transistors were used in a 0.6um CMOS technology. The resulting footprint of this chip was roughly 830um x 190um, or about 157um2.


Text adapted from: Baugh Wooley Multiplier, Paul Martin, William Etter, and Chris Setian, May 4th 2010


Comments are closed.