0% found this document useful (0 votes)
98 views4 pages

Welcome To The Parallel Jungle

1. The document discusses the end of Moore's Law and the transition to mainstream parallel computing across multicore CPUs, heterogeneous cores like GPUs, and cloud computing. 2. It argues that these trends are aspects of a single overarching transition to putting a personal heterogeneous supercomputer cluster on every device, with applications needing to harness large numbers of different local and distributed cores. 3. Software developers will need to enable applications to exploit vast numbers of cores that are increasingly specialized and located across local and remote systems in order to continue benefiting from increasing computational power.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views4 pages

Welcome To The Parallel Jungle

1. The document discusses the end of Moore's Law and the transition to mainstream parallel computing across multicore CPUs, heterogeneous cores like GPUs, and cloud computing. 2. It argues that these trends are aspects of a single overarching transition to putting a personal heterogeneous supercomputer cluster on every device, with applications needing to harness large numbers of different local and distributed cores. 3. Software developers will need to enable applications to exploit vast numbers of cores that are increasingly specialized and located across local and remote systems in order to continue benefiting from increasing computational power.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Welcome to the Parallel Jungle!

By Herb Sutter, January 29, 2012

Herb Sutter dives into the repercussions of parallel's reach from mobile devices, to the desktop, to clusters, and at the highest level of granularity to the cloud. This welter of different parallel implementations presents significant challenges for programming. The free lunch of sequential programming is well and truly over.
In the twilight of Moore's Law, the transitions to multicore processors, GPU computing, and hardware or infrastructure as a service (HaaS) cloud computing are not separate trends, but aspects of a single trend mainstream computers from desktops to "smartphones" are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done. The free lunch is over. Now welcome to the hardware jungle. From 1975 to 2005, our industry accomplished a phenomenal mission: In 30 years, we put a personal computer on every desk, in every home, and in every pocket. In 2005, however, mainstream computing hit a wall. In "The Free Lunch Is Over (A Fundamental Turn Toward Concurrency in Software)," I described the reasons for the then-upcoming industry transition from single-core to multicore CPUs in mainstream machines, why it would require changes throughout the software stack from operating systems to languages to tools, and why it would permanently affect the way we as software developers have to write our code if we want our applications to continue exploiting Moore's transistor dividend. In 2005, our industry undertook a new mission: to put a personal parallel supercomputer on every desk, in every home, and in every pocket. 2011 was special: it's the year that we completed the transition to parallel computing in all mainstream form factors, with the arrival of multicore tablets (such as iPad 2, Playbook, Kindle Fire, Nook Tablet) and smartphones (for example, Galaxy S II, Droid X2, iPhone 4S). 2012 will see us continue to build out multicore with mainstream quad- and eight-core tablets (as Windows 8 brings a modern tablet experience to x86 as well as ARM), and the last single-core gaming console holdout will go multicore (as Nintendo's Wii U replaces Wii. This time, it took us just six years to deliver mainstream parallel computing in all popular form factors. And we know the transition to multicore is permanent, because multicore delivers compute performance that single-core cannot and there will always be mainstream applications that run better on a multicore machine. There's no going back. For the first time in the history of computing, mainstream hardware is no longer a single-processor von Neumann machine, and never will be again. That was the first act.

It turns out that multicore is just the first of three related permanent transitions that layer on and amplify each other; as the timeline in Figure 1 illustrates.

Figure 1. Multicore (2005-). As explained previously. Heterogeneous cores (2009-). A single computer already typically includes more than one kind of processor core, as mainstream notebooks, consoles, and tablets all increasingly have both CPUs and compute-capable GPUs. The open question in the industry today is not whether a single application will be spread across different kinds of cores, but only "how different" the cores should be. That is, whether they should be basically the same with similar instruction sets but in a mix of a few big cores that are best at sequential code plus many smaller cores best at running parallel code (the Intel MIC model slated to arrive in 2012-2013, which is easier to program), or should they be cores with different capabilities that may only support subsets of general-purpose languages like C and C++ (the current Cell and GPGPU model, which requires more complexity including language extensions and subsets). Heterogeneity amplifies the first trend (multicore), because if some of the cores are smaller, then we can fit more of them on the same chip. Indeed, 100x and 1,000x parallelism is already available today on many mainstream home machines for programs that can harness the GPU. We know the transition to heterogeneous cores is permanent because different kinds of computations naturally run faster and/or use less power on different kinds of cores and different parts of the same application will run faster and/or cooler on a machine with several different kinds of cores. 3. Elastic compute cloud cores (2010-). For our purposes, "cloud" means specifically HaaS delivering access to more computational hardware as an extension of the mainstream machine. This trend started to hit the mainstream with commercial compute cloud offerings from Amazon Web Services (AWS), Microsoft Azure, Google App Engine (GAE), and others. 1. 2.

1.

2.

3.

4.

Cloud HaaS again amplifies both of the first two trends, because it's fundamentally about deploying large numbers of nodes where each node is a mainstream machine containing multiple and heterogeneous cores. In the cloud, the number of cores available to a single application is scaling fast. In mid-2011, Cycle Computing delivered a 30,000-core cloud for under $1,300/hour using AWS) and the same heterogeneous cores are available in compute nodes (e.g., AWS already offers "Cluster GPU" nodes with dual nVIDIA Tegra M2050 GPU cards, enabling massively parallel and massively distributed CUDA applications). In short, parallelism is not just in full bloom, but increasingly in full variety. This article will develop four key points: Moore's End. We can observe clear evidence that Moore's Law is ending because we can point to a pattern that precedes the end of exploiting any kind of resource. But there's no reason to panic, because Moore's Law limits only one kind of scaling, and we have already started another kind. Mapping one trend, not three. Multicore, heterogeneous cores, and HaaS cloud computing are not three separate trends, but aspects of a single trend: putting a personal heterogeneous supercomputer cluster on every desk, in every home, and in every pocket. The effect on software development. As software developers, we will be expected to enable a single application to exploit a jungle of enormous numbers of cores that are increasingly different in kind (specialized for different tasks) and different in location (from local to very remote; on-die, inbox, on-premises, in-cloud). The jungle of heterogeneity will continue to spur deep and fast evolution of mainstream software development, but we can predict what some of the changes will be. Three distinct near-term stages of Moore's End. And why "smartphones" aren't, really. Mainstream hardware is becoming permanently parallel, heterogeneous, and distributed. These changes are permanent, and so will permanently affect the way we have to write performance-intensive code on mainstream architectures. The good news is that Moore's "local scale-in" transistor mine isn't empty yet. It appears the transistor bonanza will continue for about another decade, give or take a half decade or so, which should be long enough to exploit the lowercost side of the Law to get us to parity between desktops and pocket tablets. The bad news is that we can clearly observe the diminishing returns as the transistors are decreasingly exploitable with each new generation of processors, software developers have to work harder and the chips get more difficult to power. And with each new crank of the diminishing-returns wheel, there's less time for hardware and software designers to come up with ways to overcome the next hurdle; the motherlode free lunch lasted 30 years, but the homogeneous multicore era lasted only about six years, and we are now already overlapping the next two eras of hetero-core and cloud-core.

But all is well: When your mine is getting empty, you don't panic, you just open a new mine at a new motherlode, operate both mines for a while, then continue to profit from the new mine long-term even after the first one finally shuts down and gets converted into a museum. As usual, in this case the end of one dominant wave overlaps with the beginning of the next, and we are now early in the period of overlap where we are standing with a foot in each wave, a crew in each of Moore's mine and the cloud mine. Perhaps the best news of all is that the cloud wave is already scaling enormously quickly faster than the Moore's Law wave that it complements, and that it will outlive and replace.

If you haven't done so already, now is the time to take a hard look at the design of your applications, determine what existing features or better still, what potential and currently unimaginable demanding new features are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from local and distributed parallelism. Now is also the time for you and your team to grok the requirements, pitfalls, styles, and idioms of hetero-parallel (e.g., GPGPU) and cloud programming (e.g., Amazon Web Services, Microsoft Azure, Google App Engine). To continue enjoying the free lunch of shipping an application that runs well on today's hardware and will just naturally run faster or better on tomorrow's hardware, you need to write an app with lots of latent parallelism expressed in a form that can be spread across a machine with a variable number of cores of different kinds local and distributed cores, and big/small/specialized cores. The throughput gains now cost extra extra development effort, extra code complexity, and extra testing effort. The good news is that for many classes of applications the extra effort will be worthwhile, because concurrency will let them fully exploit the exponential gains in compute throughput that will continue to grow strong and fast long after Moore's Law has gone into its sunny retirement, as we continue to mine the cloud for the rest of our careers.

You might also like