Learning about the terms and technologies associated with overclocking will help you understand the techniques involved. PC performance depends not only on the technology of the processor itself, but on a number of subsystems as well.
Measuring Processor Performance
Early processor performance was measured in terms of how many instructions per second the architecture could execute on a standard set of data. As processors evolved and increased in complexity, a new approach was required. Frequency (the ability to switch a circuit quickly) became the popular measurement of computational speed.
Frequency is best described as the rate at which an IC can change between its two states in a given period of time. Computer processor frequency is generally measured in megahertz. The term megahertz refers to millions of cycles per second, and the abbreviation is expressed as MHz. Gigahertz (expressed as GHz), or billions of cycles per second, is becoming the de facto standard due to significant speed gains in the latest generation of processors.
Physical Properties of Integrated Circuits
Several physical properties directly influence a processor’s speed potential, but the die fabrication size of the processor’s core circuitry is the most important. The core die size represents the actual physical distance between each trace signal route used to construct transistor pathways within the processor. A smaller die size means that the processor can generally operate at higher clock frequencies while using less voltage and producing less heat.
The current industry standard die size is .18 micron (µ), which represents a balance between electrical and thermal constraints, yet retains scalability beyond 1 GHz.
Popular designs using the .18 -micron die size include Intel’s Pentium III Coppermine and Pentium 4 Willamette, and AMD’s Athlon Thunderbird.
Figure 3-1: .18-micron Pentium 4
A significant number of PCs still in use contain processors fabricated with the much older .25-micron core die size. These include processors like the Pentium II and K6-
2. Massive cooling systems are needed when overclocking .25-micron processors, because these chips demand much higher voltage levels compared to those required by their .18-micron counterparts. It is difficult to scale these older designs beyond 600 MHz.
The core die size for the latest generation of processors, like Intel’s Pentium III Tualatin and Pentium 4 Northwood, is the radically small .13 micron. These chips offer relatively low thermal dissipation rates (up to 50% lower than .18-micron models), as well as significantly lower core voltage requirements. Improved MHz scalability is the direct result of these advancements, and many .13-micron designs are expected to scale to 3 GHz and beyond before the next-generation fabrication process, which should be in the range of .7 to .9 micron, is introduced.
These overclocked chips need more power to keep them stable at extended MHz operating speeds. Current designs are built atop a split-voltage architecture. Core voltage represents the internal electrical properties of the processor and corresponds with the die size employed during fabrication. Input/output voltage represents the operational voltage of the processor-to-chipset bus. It usually includes the power levels of other front-side bus components within a traditional system configuration.
Thermal Dissipation Rate
The thermal dissipation rate is a measurement of heat generated within an electrical circuit. The actual thermal unit employed in this measurement is watts. Assuming that the core die size remains consistent, the thermal rate increases proportionally to rises in operational speeds and core voltage levels. The processor’s heatsink cooling mechanism is worth examining in any PC.
Figure 3-2: Processor heatsink cooler
Most designs use a large metal heatsink coupled with a fan to provide a forced-air cooling system for maximum heat dissipation at a relatively low cost. Other cooling systems are available, including vapor-phase and thermoelectric technologies, but their associated costs are usually prohibitive for the average desktop PC user.
Whatever the cooling system, efficient thermal regulation is an important factor in successful overclocking. If core temperatures exceed normal operating specifications, the system can become unstable. Circuits can also be damaged during prolonged periods of intense heat.
While overclocking is often regarded as a rogue process, the premise behind it is well documented within the computer industry. Once a particular processor design has been finalized and taped out for silicon production, the manufacturer moves into the production and marketing phase of development. The manufacturing of processors or of any circuit device, is known as fabrication.
Figure 3-3: Processor fabrication process
The fabrication of an integrated circuit device begins with the selection of a substrate material. The substrate provides a base for the layers of electrical circuits that create
transistor pathways. Both the type and quality of the substrate are important in determining maximum operating speed for a given processor design.
All commercial processors currently on the market are built atop a silicon substrate. Silicon is a readily available element with good electrical isolation properties. It can be harvested from a variety of sources, and can even be obtained from common sand. The use of silicon minimizes production costs for processor manufacturers.
Figure 3-4: Silicon substrate circuit
Silicon substrates in today’s processors contain impurities left during the extraction process. These limit the substrate’s electrical insulation efficiency, and lead to lower yield rates and slower core operating speeds.
CMOS fabrication techniques will likely change to accommodate upcoming generations of processors. Processors are currently manufactured using aluminum or copper metal layers within the transistor gate array. Copper offers less resistance and better conductivity than its aluminum counterpart. Nearly all newer processor designs therefore incorporate copper trace-route technologies, though an evolution in substrate technologies will be required to consolidate the gains in speed and efficiency.
The SOI Standard
Silicon-on-insulator (SOI) is primed to be the next substrate manufacturing standard. It differs from CMOS in that it places the transistor silicon junction atop an electrically insulated layer, commonly of glass or silicon oxide. Capacitance, a measure of ability to store an electrical charge, can be minimized in the gate area using the SOI switching technique.
Figure 3-5: Silicon-on-insulator circuit
Any transfer medium that can conduct electricity will exhibit capacitance to some degree. A MOS transistor is regarded as a capacitance circuit, implying that the MOS circuit must actually charge to full capacitance before it can activate its switching capability. The process of discharging and recharging a transistor requires a
relatively long time compared to the time needed to switch the voltage state of a metal layer within a traditional transistor architecture. SOI is an attempt to eliminate this capacitance boundary: a low capacitance circuit will allow faster transistor operation. Accordingly, the ability to process more instructions in a given timeframe increases as latency in the transistor array decreases.
Figure 3-6: IBM SIMOX process
IBM has pioneered research into SIMOX, a silicon purification method that uses a high-temperature injection to introduce oxygen into a silicon wafer and thus purify the substrate material. Oxygen bonds with silicon at high temperatures; thus a thin layer of silicon oxide film is formed. This nearly perfect layer allows for direct bonding of a pure crystalline silicon substrate. The greatest advantage of SIMOX is its significantly lower production costs compared to crystalline-based SOI methods that use expensive ruby or sapphire design materials.
Silicon-on-insulator is not the only upcoming technology to revolutionize the substrate production process. Perhaps the most promising future technology involves the compression and purification of nitrogen. In this process, purified nitrogen gas is compressed and tempered into a solid form. Once depressurized, the nitrogen remains in a solid state. Substrates produced from this technique are expected to be almost perfectly pure, while the abundant supply of nitrogen within our atmosphere could lower production costs.
Light lithography is used to etch specific circuit pathways within a processor core. A shadow mask is created from a scaled blueprint of the processor’s core circuitry. This shadow mask is then used in conjunction with a light etching process that literally burns the circuit pathways into the processor substrate. Additional shadow masks are then applied to create the complex multilayer circuitry found within a processor.
Figure 3-7 shows a silicon wafer being tested after etching.
Figure 3-7: Silicon wafer being tested after etching
Etching can lower production costs by producing multiple processors at once. A large wafer of silicon is placed within the light masking system, which produces a “batch” of processors during a single pass. Each processor shares a common circuit design, with certain fail-safe and redundancy features embedded into the core architecture.
Variation in quality among processors is due to the physical limitations involved in production.
Figure 3-8: Intel processor fabrication lab
AMD is scheduled to release a silicon-on-insulator processor based on its popular Athlon architecture before the end of the year 2002. This new Athlon design should arrive under a development project codenamed Thoroughbred, the first introduction of SOI technologies into the mainstream computing market for x86 architectures.
Assuming the Thoroughbred design proves successful, other manufacturers, including Intel, will move quickly to adopt similar production techniques to extend the operating speed of current processor designs.
Figure 3-9: AMD processor roadmap
Laboratory testing shows that SOI-based processors can achieve up to a 25% improvement in transistor cycle time compared to the same architecture manufactured with more traditional CMOS fabrication techniques. Performance gains can average 25 to 35% when SOI is employed. Considering the efficient scalability of such an advanced design, the upcoming Athlon Thoroughbred could rapidly emerge as the dominant choice for overclocking enthusiasts.
Quality Control and Overclocking
As with any fabrication process, the actual quality of each unit in production can vary under the influence of numerous variables, both internal and external. For example, consider automobile manufacturing. Thousands of vehicles are manufactured during every production year. One individual vehicle may outperform another of the same make and model simply because no two cars rolling off the assembly line are exactly the same.
Assembly lines operate within tolerances. Designers set base specifications that represent a minimum standard of quality before the product can be sold in the retail market. Let’s take the automotive industry example to the next level. Assume that each vehicle within a given model line must perform at a specific minimum miles-per- hour rating before being released for retail consumption. Imagine that a speed limit of 100 miles per hour is such a minimum. In order to test the production quality, designers could sample two individual vehicles of the same model manufactured on different days. Vehicle A reaches maximum performance at 100 mph, so the designers are satisfied. Vehicle B offers even better performance, reaching 105 mph during testing. While both vehicles were produced at the same plant with the same materials, small differences between them can result from even smaller variances in the manufacturing process.
Figure 3-10: Intel quality control testing
Most computer processor manufacturers use more exacting tolerances than those illustrated in the automobile example, though the general analogy holds. Because the average processor is comprised of millions of microscopic transistor circuits, the possibility for variances is considerable. For example, each speed grade of the Athlon XP processor from AMD is fabricated to essentially the same expected tolerances. Minor fabrication differences among units may lead one processor to reach a maximum stable operational speed of 1.4 GHz, while another chip of the same design may operate at 1.6 GHz.
The Economics of Speed Binning
Automobile companies sell all comparably equipped vehicles of a particular model at the same base price, while a processor company can choose to sell the better performing chip at a higher price to maximize profit yields with lower capital costs. If the automobile manufacturer in the above example operated like a processor manufacturer, it would sell the car that reached 100 mph at one price and the car that reached 105 mph at a higher price, even though both vehicles were essentially identical in every other respect. Conversely, a processor manufacturer like AMD could behave like an automobile maker, offering each CPU without any performance rating beyond the flat minimum speed requirement. This marketing strategy would yield the same base revenue for all processors. Profits would decline due to the standard economies of supply and demand.
Offering varying speed grades of the same product means maximizing profit. Most computer users exhibit a consistent desire for better performance, whether they need it or not. Computer performance is dictated by megahertz ratings, assuming all other subsystem characteristics are identical. Most buyers equate MHz ratings to performance, though many other aspects of processor design contribute to performance. The cost of acquiring higher MHz models can prove limiting, which leads some consumers to investigate the benefits of overclocking.
Popular fabricators and chip suppliers like Intel and UMC produce millions of circuit- based devices each year. The trend toward bulk fabrication techniques leads to a practice known as speed binning, which allows the computer industry to differentiate
cost and performance characteristics while protecting profits. Still, the sheer number of chips being produced each year prevents manufacturers from testing every individual processor for its maximum operating speed potential.
Figure 3-11: Intel quality control testing
In Speed Binning, the manufacturer selects processors from a given production batch and puts those processors through reliability testing to determine the maximum reliable speed that is common to all processors in that batch. The processors in the batch are then usually marked for sale at the speed rating determined in the testing process.
Even though speed binning is a well-developed and highly efficient process, quality variances still exist among processors in any given batch. These variances often allow processors to be overclocked beyond their rated speed, since the speed rating for any given batch must be such that all processors in the batch can operate at the speed rating, even though each and every individual processor is not speed tested.
Speed binning produces unique benefits for both manufacturers and consumers. Manufacturers can charge higher selling prices for better performing processors. Consumers can opt for lower speed grades to minimize costs, while enthusiasts among them may obtain an even greater performance-to-price ratio if they are willing to push the stability envelope and overclock their processors.
The frequency of a processor represents only the core operating speed; each subsystem within a computer may operate at various other rates. It is important to understand how these frequencies interact with each other before embarking on the overclocking process. Additional factors related to the processor’s physical properties play a key role in understanding the process. These include core die sizes, electrical aspects, and thermal regulation.
Figure 3-12: Traditional motherboard layout
A phased locked loop or PLL circuit resides at the simplest level of the frequency generation equation. Some older designs were based around a set frequency crystal, though PLL circuits have been the mainstay logic timing control technology for many years now. The PLL acts as a base frequency synthesizer by cycling its generated signal according to a preprogrammed routine. The locking of the circuit in a specific pattern creates a phase shift in the signal, thus producing a cycling effect that drives the frequency generation scheme. The PLL signal travels across a motherboard bus, dedicated to timing, to dictate the frequency needed for the operation of other buses. The primary recipient of the PLL signal is the motherboard’s main controller, known as the chipset.
Chipset designs differ greatly across the wide range of platforms available, though the basic concept is shared. The frequency rate at which the chipset operates is the motherboard’s primary operating speed. The chipset provides a communications hub for all of the system’s various components. It also controls routing and logic for most primary control operations, ranging from memory addressing to data transfers across different bus standards.
The term front-side bus rate is widely used to describe the motherboard’s frequency rate, as this same rate is often also used for the memory and processor buses within a traditional system design. To confuse matters, many of the latest architectures like the AMD Athlon or Intel Pentium 4 blur the relationships among each of these three primary buses by separating each bus at the chipset connection point. The back-side bus, on the other hand, is generally composed of additional input/output mechanisms, such as PCI and AGP connection buses.
Figure 3-13: Intel i850 chipset diagram
Upon receiving the base PLL signal, the chipset generates a signal to the other buses. The most important signal to overclockers is the processor bus rate of the front-side bus, as this directly determines the central processing unit’s core operating speed when combined with the processor multiplier value. The PLL circuit provides the base timing signal for the motherboard chipset, which in turn passes the value to the processor. The processor then internally multiplies this clock rate to derive its core clock operating frequency.
Frequency Timing Scheme
The best way to describe this process is to refer to a common system design, such as the Pentium III platform. A quick examination of a common chipset, such as the VIA’s Pro133A model, shows how the process actually works. The Pro133A chipset is built primarily for a 100-MHz operation, though the Pentium III processor itself features a much higher operating speed. The core processor rate is determined by inserting a multiplier into the timing signal. Thus, a Pentium III 650e processor uses a 6.5x clock multiplier, given that the chipset is operating at 100 MHz. Multiplier values are generally spaced in .5x increments; this scheme allows for a wide range of operating frequencies when speed-binning processors.
Most platforms use the timing scheme presented in the Pentium III example, though some of the newer architectures, notably AMD’s Athlon and Intel’s Pentium 4, can alter the interpretation. The x86 Athlon uses a modified bus architecture developed from a non-x86 DEC Alpha EV6. The Athlon inserts a double data rate (DDR) signaling pattern into the processor-to-chipset interconnect bus. DDR signaling uses the rising and falling edges of the base clock signal to effectively transfer twice as much data as traditional buses can transfer in a similar period of time.
Figure 3-14: QDR signal pattern
The Pentium 4 goes one step further with a pseudo quad data rate (QDR) processor bus design. Without going deeply into deeply technical issues, the P4 processor bus can be viewed as implementing DDR signaling across two 180-degree co-phased timing signals that travel essentially the same bus pathway. More about each of these platforms can be found in the architecture-specific overclocking sections of this book.
The Pentium 4 goes one step further with a pseudo quad data rate (QDR) processor bus design. Without going deeply into deeply technical issues, the P4 processor bus can be viewed as implementing DDR signaling across two 180degree co-phased timing signals that travel essentially the same bus pathway. More about each of these platforms can be found in the architecture-specific overclocking sections of this book.
Figure 3-15: CPU-Z CPUID application
A software application called CPUID can help you determine the particular model and speed grade of your PC processor if you are unsure of its configuration. CPU-Z by Frank Delatree is a popular freeware example. This valuable utility can be obtained at http://www.cpuid.com/cpuz.htm. CPU-Z can provide information about multiplier values, bus rates, and various other technological aspects of most currently available processors.