Dual Core Architecture Seminar Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Lappeenranta University of Technology

Information Technology
CT30A7001
Concurrent and Parallel Computing

DUAL CORE ARCHITECTURE


Seminar Paper
Oct 30, 2008

Group 04
Manish Thapa [c0346938 manish.thapa@lut.fi]
Madan Kadariya [c0346967 madan.kadariya@lut.fi]

1
ABSTRACT

Lappeenranta University of Technology


Department of Information Technology

Manish Thapa
Madan Kadariya

Dual Core Architecture Seminar Paper

2008

15 pages

Examiners: Professor, D.Sc. (Tech.) Jari Porras

Keywords: dual core, multithreading, Intel Itanium architecture

With the increase in clock speed of a processor, in a single core processor heat dissipation is
increased within a processor. Uncontrolled heat causes error or even damages a processor. So,
increasing clock speed for performance was bounded. So, optimal way to find the progressive
growth in capabilities of a processor with less clock speed was sought for. The generated idea is
placing two processors within a single package with their cache and cache controller together. This
is how development of dual core initiated.

This seminar paper briefly explains about the dual core, its need, its utilization, its stand with single
core and dual processors. This seminar paper illustrates one of the dual core architecture with
reference to Intel Dual Core Itanium 2 Processor and explains the few topics in relevance to this
particular processor which helps to get some idea about parallel computing.

2
TABLE OF COTETS

1. Introduction 4
2. Need of dual core 4
3. Stand of a dual core 6
4. Utilization 7
5. How dual core works 8
6. Intel® Dual-Core Technology 8
7. Intel Itanium 2 Processor
7.1 Introduction 9
7.2 History 9
7.3 Architecture 10
7.3.1 The Intel Itanium architecture 10
7.3.2 Instruction Execution 12
7.3.3. Memory architecture 12
7.4 Features 13
7.5 Software support 13
7.6 Competition 14
8. Conclusion 14
References

3
1. ITRODUCTIO

Dual-core CPU refers to has two complete execution cores in a single integrated circuit. It includes
two processors and their caches and cache controllers onto a single integrated circuit (silicon chip)
referred as die. Both core work side by side to help each other in processing and executing. Dual
Core is monolithic processor which means processor with all cores on a single die. Each "core"
independently implements optimizations such as superscalar execution, pipelining, and
multithreading. A system with n cores is effective when it is presented with n or more threads. [1]

Though may sound similar dual-core CPU and a dual-processor CPU are different. In dual core both
reside within a same package. Dual-processor is the term for using two processors, not necessarily
on the same chip and not necessarily to be in the same motherboard also.

Fig 1: Diagram of a generic dual core processor, with CPU-local level 1 caches, and a shared, on-
die level 2 cache.
Source: http://en.wikipedia.org/wiki/Image:Dual_Core_Generic.svg

2. EED OF DUAL CORE

With the forward in manufacturing technology continues, reducing the size of single gates, physical
limits of semiconductor-based microelectronics have become a major design concern. Adverse

4
effects of these limits can cause heat dissipation and data synchronization problems. Need of
increasing capable microprocessor thus required some new idea to develop something that could
handle those physical limits. Instruction-level parallelism (ILP) method like superscalar pipelining
was thought of, but seems inefficient because of difficult-to-predict code. Thread level parallelism
(TLP) method implementation is improved idea, and multiple independent CPUs are one common
method used to increase a system's overall TLP. A combination of increased available space due to
refined manufacturing processes and the demand for increased TLP to solve bigger real life
problems and gaming is the logic behind the creation of dual-core CPUs. [2]

Processor is a device that executes a series of instructions to tell it what to do. The faster it can do it
is considered better. “Faster” can be directly related to clock speed. Both AMD and INTEL scaled
up the clock speeds of their processors in a very short amount of time but have recently slowed the
curve. The computer market has long enjoyed the steady growth of processor speeds. A processor's
speed is largely determined by how fast a clock tells the processor to perform instructions. There is
some constraint on power requirement on the rate at which processor clock speeds can be increased.
This trend is shown quite clearly in Figure 2 below where the average clock speed and heat
dissipation for Intel and AMD processors plotted over time.[3]

Fig:2 Average clock speed and heat dissipation for Intel and AMD processors.

Source: www.ccur.com/pdf/Preparing_forthe_multicore_revolution.pdf

5
The power consumption is seen elevating in the above graph which requires additional cooling and
electrical service to keep the processor continue its operation. More power consumption is
analogous to more heat production. The solution was to scale out processor cores instead of scaling
up the CPU frequency. It is the flattening of the clock speed curve that some are reasoning why a
shift to dual core was sought. The drop-off in clock speed on the graph indicates the delivery of the
first dual-core processors from AMD and Intel.

The electricity running around the die is prone to noise. The noise refers to interference. The
pathways on a processor are microscopically close together. The more power that runs through
these pathways due to the requirement of higher clock speeds means that there will be a small
amount of electrical radiation from one pathway to the next. That leakage could corrupt the data in
another pathway.[4]So, dual core processors are designed to run at a slower clock rate than single
core designs due heat issues. These dual-core chips can, in theory, deliver twice the performance of
a single-core chip and thus help continue the processor performance march.[3].Though it is not
practically achievable because of overheads included while doing multi core.

3. STAD OF A DUAL CORE

Dual-core CPU can not be twice as fast as a single-core model running at the same clock speed.
There are many issues to degrade the performance. Dual-core CPUs can only work their magic
when there is more than one discrete set of tasks to work on known as a "thread". A single-threaded
application running on a dual-core CPU simply will not benefit from that second core. When we try
to share work between cores, there's overhead involved. Overhead includes load balancing issues,
communication between cores and synchronization. Depending on the nature of the task, it is
observed that adding a second core will boost performance by up to 70 percent over a single-core
CPU. But again, because dual-core CPUs run at lower clock rates, the advantage over competing
single-core processors is slim. Even though, dual-core CPUs can work their magic. Business users,
for instance, typically have several programs open at once. Dual-core CPUs can help speed things
up when we are doing many things at the same time, such as working on a document while loading
a page in web browser and listening to music on a media player.

Most important, more and more software is being tuned with dual-core processors in mind. Many
game vendors and graphics-card companies have aggressively adopted multithreaded architectures
to tap dual-core systems. "Even if the game is single-threaded, all the graphics and 3D [drivers] are
multithreaded," says Brookwood. [5]

6
Multithreaded code is used by many media-creation applications, such as Adobe Photoshop and
Premier. It can be thus expected that multithreading to become more persistent as software vendors
seek to cater to a large installed base of dual-core CPUs.Dual core processors work best when
software can run in parallel on them. So called multithreaded applications benefit from an
additional CPU core because subroutines can be allocated to different ALU of dual core.
Administering the threads carries an overhead, though, which means that dual core processors are
never exactly twice as fast as two single cores. [5]

Thus, a dual core processor is a cross between a single core processor and a dual processor system.
A dual core processor has two cores and share hardware like the memory controller and bus. A dual
processor system has completely separate hardware and shares nothing with the other processor. A
dual core processor won't be twice as fast as a single core processor nor will it be as fast as a dual
processor system. It is in mid of single core and dual processors in terms of performance but has lot
more to offer.

4. UTILIZATIO

To utilize a dual core processor to the fullest, the operating system must support multi-threading
and the software that run must have simultaneous multi-threading technology (SMT) considered
during its development. SMT enables parallel multi-threading wherein the cores are served multi-
threaded instructions in parallel. Without SMT the software will only recognize one core. SMT
techniques are considered in server development and for instance Adobe Photoshop also supports
SMT. [6]

Thus, complete optimization for the dual-core processor requires both the operating system and
applications running on the computer to support a thread-level parallelism, or TLP. Thread-level
parallelism is the part of the OS or application that runs multiple threads simultaneously, where
threads refer to the part of a program that can execute independently of other parts.

Even without a multithread-enabled application, we can still see benefits of dual-core processors if
we are running an OS that supports TLP. For example, using Microsoft Windows XP which
supports multithreading, we can open browser, virus scanner, stream audios or videos at a same
time and the dual-core processor will handle the multiple threads of these programs running
simultaneously with an increase in performance and efficiency.

7
Today latest operating systems and hundreds of applications already support multithread
technology, especially applications that are used for editing and creating music files, videos and
graphics because types of programs need to perform operations in parallel. As dual-core technology
becomes more common in homes and the workplace, awareness to build system supporting thread
level parallelism is also increased.

5. HOW DUAL CORE WORKS

In a single-core or traditional processor the CPU is fed strings of instructions it must order, execute,
then selectively store in its cache for quick retrieval. When data outside the cache is required, it is
retrieved through the system bus from random access memory (RAM) or from any other storage
devices. Accessing these slows down performance to the maximum speed the bus, RAM or storage
device will allow, which is far slower than the speed of the CPU. The situation is compounded
when multi-tasking. In this case the processor must switch back and forth between two or more sets
of data streams and programs. CPU resources are depleted and performance suffers.

In a dual core processor each core handles incoming data strings simultaneously to improve
efficiency. When one is executing the other can be accessing the system bus or executing its own
code. With this advent, both AMD and Intel leading producers of processors have production of
their dual core processors. [6]

6. ITEL® DUAL-CORE TECHOLOGY

Designed from the ground up for revolutionary energy-efficient performance, Intel® dual-core
processors enable exceptional productivity enhancing features and rich multimedia experiences.
With its different approach of processor architecture design, Intel dual-core processors have become
the standard for desktop, mobile, and server platforms. We can do many thing together, without
slowing down. The key features are:

• Boost multitasking power with improved performance for highly multithreaded and
compute-intensive applications
• Reduce costs and use less power with energy-efficient Intel dual-core processors built on
Intel® Core™ micro architecture.
• Enjoy flexibility and the performance to handle robust content creation or intense gaming
with multimedia-enabling technologies built in.[7]

8
7. ITEL ITAIUM 2 PROCESSOR

7.1 Introduction
The Itanium is a 64-bit Intel microprocessor that implements the Intel Itanium architecture. There are
basically two processor families in the Intel Itanium architecture: Itanium and Itanium-2 families. These
processors are mostly used in high performance computing systems and enterprise server solutions. This
architecture was initially developed at HP and was later HP and Intel collaborated to build the Itanium
series of processor's.
The first Itanium microprocessor was released in 2001, and more powerful Itanium processors have
been released frequently over the past few years. HP produces most Itanium-based systems, but several
other manufacturers have also developed systems based on Itanium. As of 2007, Itanium is the fourth-
most deployed microprocessor architecture for enterprise-class systems. The architecture is different
from past x86 architecture and can execute six instructions per cycle.

7.2 History
Development
Explicitly Parallel Instruction Computing (EPIC) which implements a form of Very Large Instruction
Word (VLIW) architecture allowed the processor to execute more than one instruction in one clock
cycle. EPIC came into existence as the replacement of reduced instruction set computing (RISC)
computers which could execute only one instruction/cycle. With EPIC, the compiler determines in
advance which instructions can be executed at the same time, so the microprocessor simply
executes the instructions and does not need elaborate mechanisms to determine which instructions
to execute in parallel on its own.
In 1994, HP and Intel jointly developed the IA-64 architecture, which derived from EPIC. Intel had
undertaken a large development effort on IA-64 with the vision that they could sell it to majority of the
enterprise systems manufacturers. HP and Intel initiated a large joint development effort with a goal of
delivering the first product codenamed Merced, in 1998. Due to the structural problems within the
project between Intel and HP, they used different methodologies and had slightly different
priorities. Later on, Intel announced the official name of the processor, Itanium on October 4, 1999.
By the time Itanium was released in June, 2001, it was no longer superior to the RISC and CISC
processors. Sales were not as expected because of poor yields, relatively poor performance, and high
cost and limited software availability and this lack of software raised a serious issue to move forward.
To stimulate the development, Intel made thousands of these early systems available to independent
software vendors (ISVs). HP and Intel brought the next-generation Itanium 2 processor to market a
year later. [8]

9
Itanium 2 processors: 2002–present
The Itanium 2 was released in 2002, aiming enterprise servers, not to High-end computing. The initial
Itanium 2 was codenamed McKinley, used a 180 nm process, but it relieved many of the performance
problems of the original Itanium. In 2003, AMD released the Opteron, which implemented its x86-64,
64-bit architecture. Opteron gained rapid acceptance in the enterprise server space because it provided
an easy upgrade from x86. Intel responded by implementing x86-64 in its Xeon microprocessors in
2004. Intel released a new Itanium 2 family member, named Madison, in 2003. Madison used a 130 nm
process and was the basis of all new Itaniums until Montecito was released in June 2006.Itanium is not a
high-volume product for Intel. Intel does not release production numbers, but one industry analyst
estimated that the production rate was 200,000 processors per year in 2007. The total number of
Itanium servers sold by all vendors in 2007 was about 55,000. This compares with 417,000 RISC
servers and 8.4 million x86 servers. From 2001 through 2007, an IDC report shows that a total of
184,000 Itanium-based systems have been sold. This means Itanium-based system revenue reached
26% in the second quarter of 2008. [8]

7.3 Architecture
7.3.1 The Intel Itanium architecture
Widely referred to as IA-64, Intel Itanium Architecture is 64-bit register-rich explicitly-parallel
architecture. The base data word is 64 bits, byte-addressable. The logical address space is 264 bytes.
The architecture implements predication, speculation, and branch prediction under control of the
compiler: each instruction word includes extra bits for this. It uses a hardware register renaming
mechanism rather than simple register windowing for parameter passing. The same mechanism is
also used to permit parallel execution of loops. The architecture implements 128 integer registers,
128 floating point registers, 64 one-bit predicates, and eight branch registers. The floating point
registers are 82 bits long to preserve precision for intermediate results. [8]

10
Fig 3 : Itanium Architecture

Src:http://upload.wikimedia.org/wikipedia/commons/7/7c/Itanium_arch.png

The Intel Itanium processor has two complete 64-bit processing cores on a single processor with up
to 24 MB low-latency L3 cache which provides high bandwidth for both cores. It incorporates
Hyper Threading (HT) technology with which the number of threads in the operating system is
doubled in each core. It yields four times the threads usability by the operating system. High cache
together with the Hyper-Threading (HT) doubles the performance compared to earlier dual-core
processors. EPIC provides different advanced implementations of parallelism, prediction, and
speculation for a great instruction level parallelism (ILP). This feature helps to address the
requirements of high-end business enterprise and simulation needs. The dual core Intel Itanium 2
processor includes hardware-assisted virtualization that support increase virtualization effects and
broaden operating compatibility. In conjunction with dual core performance improvements and
unparalleled scalability advantages, Intel virtualization technology makes Dual Core Itanium 2
based systems and excellent platform for data intensive virtualization. [9]
The Intel Itanium 2 uses 20% less power than the previous dual-core Itaniums with 2.5 times higher
performance per watt, which lowers the energy requirements with major performance

11
improvements. The Itanium contains 128 general and 128 floating-point registers that support
rotation. Also, a register stack engine is used to improve the management of processor resources.
Another feature introduced in the Itanium 2 is the support of prediction and speculation that helps
improve the processing performance. It has high-Bandwidth System Bus for scalability. The
processor uses up to 8.53 GB/s bandwidth. It has a 128-bit data bus (64 bits dedicated to each core).
It also provides 50-bits of physical memory addressing and 64-bits of virtual addressing. The
busses, with 400-533 MHz frequency, are expendable to systems with multiple system busses.

7.3.2 Instruction Execution


A 128-bit instruction word contains three instructions, and two instruction words per clock can be
fetched from the cache into the pipeline. With full usage of this, the processor can execute six
instructions per clock cycle. The processor has thirty functional execution units in eleven groups.
Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of
one instruction per cycle unless execution stalls waiting for data. While not all units in a group
execute identical subsets of the instruction set, common instructions can be executed in multiple
units. The execution unit groups consists of

• Six general-purpose ALUs, two integer units, one shift unit


• Four data cache units
• Six multimedia units, two parallel shift units, one parallel multiply, one population count
• Two floating-point multiply-accumulate units, two "miscellaneous" floating-point units
• Three branch units[8]

7.3.3. Memory architecture


The Itanium 2 processors have 3 levels of cache. The Level 1 cache is 16KB for both instruction
and another 16 KB for data. The Level 2 cache is 256KB fro both instruction and data. The Level 3
cache varies from 1.5MB to 24MB. Level 2 cache is used to handle semaphore logic. Main memory
is accessed through a bus to an off-chip chipset. The speed of the bus is it transfers 2x128 bits per
clock cycle. [8]

12
7.4 Features
Some of the key features of the Itanium include:
EPIC Architecture
With explicit parallel instruction computing offered in Intel Itanium Architecture, high end
enterprise and business workload could be addressed.
Dual Core Processing
With two complete 64- bit cores in a single processor, clock size limitation in single core processor
is addressed.
Intel Hyper-Threading Technology
Compared to single core, four times the applications threads can be run at a time.
Intel Virtualization technology
With scalability and dual core aid, it prepares excellent platform for data-intensive virtualization.
Cache Safe technology
Minimizes cache errors and enables to operate even in the event of errors.
Energy Efficiency
Takes less power compared to earlier series. Thus increases power performance watt.
Security
Faster data encryption, robust memory and hardware authentication of firmware enables data
security.
Features to support flexible platform environments:
An IA-32 execution layer is available in the Itanium 2 to support IA-32 application binaries. The
processor contains an abstraction layer that eliminates processor dependencies. [9]

7.5 Software Support

To add software support compatibility in Itanium, Intel supported the development of effective
compilers for its platform including gcc, Open 64 and MS Visual Studio. Itanium is supported by
Windows Server 2003 and Windows Server 2008 and multiple Linux distributions. Itanium also
supports mainframe environment GCOS from Groupe Bull and several IA-32 operating systems via
Instruction Set Simulators. According to the Itanium Solutions Alliance, as of early 2008, over
13,000 applications are available for Itanium based systems. [8] The ISA also supports Gelato, an
Itanium HPC user group and developer communities that supports open source software for
Itanium. [8]

13
7.6 Competition

The Itanium 2 competes in the enterprise server and high-performance computing (HPC) markets.
Itanium's major competitors include Sun Microsystems UltraSPARC IV+, Fujitsu's SPARC64,
IBM's POWER6, AMD's Opteron, and Intel's own Xeon servers.Itanium has had the best floating
point performance relative to fixed-point performance compared to any other general-purpose
microprocessor. [8]

8. COCLUSIO

This seminar has helped us a lot to explore a bit in parallel technology in the area of dual core
processors and its architecture. We knew ideas about dual core, quad core and multi core
development and studied the key features of architecture of Intel Itanium 2 Architecture. Overall
dual core processors have not created yet the impact in market as expected because of not extensive
use of thread level parallelism in application development. But sooner or later multicore processor
will utilize its capabilities to the fullest with multithreaded support operating system and
applications.

14
REFERECES

[1] http://en.wikipedia.org/wiki/Dual_Core
[2] http://mediacoder.sourceforge.net/wiki/index.php/Multi-Core
[3] www.ccur.com/pdf/Preparing_forthe_multicore_revolution.pdf
[4] http://icrontic.com/articles/dual_core
[5] http://www.widowpc.com/2006/01/dual_core_compu.php
[6] http://www.wisegeek.com/what-is-a-dual-core-processor.htm
[7] http://www.intel.com/technology/computing/dual-core/
[8] http://en.wikipedia.org/wiki/Itanium
[9] http://download.intel.com/products/processor/itanium2/dc_prod_brief.pdf

15

You might also like