Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015, Communications of the ACM
…
1 file
Optimizing NUMA systems applications with Carrefour.
Proceedings of the thirteenth ACM symposium on Operating systems principles - SOSP '91, 1991
The study of operating systems level memory management policies for nonuniform memory access time (NUMA) shared memory multiprocessors is an area of active research. Previous results have suggested that the best policy choice often depends on the application under consideration, while others have reported that the best policy depends on the particular architecture. Since both observations have merit, we explore the concept of policy tuning on an application/architecture basis. We introduce a highly tunable dynamic page placement policy for NUMA multiprocessors, and address issues related to the tuning of that policy to di erent architectures and applications. Experimental data acquired from our DUnX operating system running on two di erent NUMA multiprocessors are used to evaluate the usefulness, importance, and ease of policy tuning. Our results indicate that while varying some of the parameters can have dramatic e ects on performance, it is easy to select a set of default parameter settings that result in good performance for each of our test applications on both architectures. This apparent robustness of our parameterized policy raises the possibility of machine-independent memory management for NUMAclass machines.
IEEE Transactions on Parallel and Distributed Systems, 1992
The class of NUMA (nonuniform memory access time) shared memory architectures is becoming increasingly important with the desire for larger scale multiprocessors. In such machines, the placement and movement of code and data are crucial to performance. The operating system can play a role in managing placement through the policies and mechanisms of the virtual memory subsystem. In this paper, we explore dynamic page placement policies using two approaches that complement each other in important ways. On one hand, we measure the performance of parallel programs running on the experimental DUnX operating system kernel for the BBN GP1000 which supports a highly parameterized dynamic page placement policy. We also develop and apply an analytic model of memory system performance of a Local/Remote NUMA architecture based on approximate mean-value analysis techniques. The model assumes that a simple workload model based on a few parameters can often provide insight into the general behavior of real applications. The model is validated against experimental data obtained with DUnX while running a synthetic workload. The results of this validation show that in general, model predictions are quite good, though in some cases the model fails to include the e ect of subtle behaviors in the implementation. Experiments investigate the e ectiveness of dynamic page-placement and, in particular, dynamic multiple-copy page placement, the cost of replication/coherency fault errors, and the cost of errors in deciding whether a page should move or be remotely referenced.
ACM Sigarch Computer Architecture News, 1991
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs. We have used this information to explore the relationship between kernel-based NUMA management policies and multiprocessor memory architecture. Our trace analysis techniques employ an off-line, optimal cost policy as a baseline against which to compare on-line policies, and as a policyinsensitive tool for evaluating architectural design alternatives. We compare the performance of our optimal policy with that of three implementable policies (two of which appear in previous work), on a variety of applications, with varying relative speeds for page moves and local, global, and remote memory references. Our results indicate that a good NUMA policy must be chosen to match its machine, and confirm that such policies can be both simple and effective. They also indicate that programs for NUMA machines must be written with care to obtain the best performance.
2012
Definition NUMA is the acronym for Non-Uniform Memory Access. A NUMA cache is a cache memory in which the access time is not uniform but depends on the posi tion of the involved block inside the cache. Among NUMA caches, it possible to distinguish: () the NUCA (Non-Uniform Cache Access) architectures, in which the memory space is deeply sub-banked, and the access latency depends on which sub-bank is accessed; and () the shared and distributed cache of a tiled Chip Mul tiprocessor (CMP), in which the latency depends on which cache slice has to be accessed.
Lecture Notes in Computer Science, 2012
Some typical memory access patterns are provided and programmed in C, which can be used as benchmark to characterize the various techniques and algorithms aim to improve the performance of NUMA memory access. These access patterns, called MAP-numa (Memory Access Patterns for NUMA), currently include three classes, whose working data sets are corresponding to 1-dimension array, 2-dimension matrix and 3-dimension cube. It is dedicated for NUMA memory access optimization other than measuring the memory bandwidth and latency. MAP-numa is an alternative to those exist benchmarks such as STREAM, pChase, etc. It is used to verify the optimizations' (made automatically/manually to source code/executive binary) capacities by investigating what locality leakage can be remedied. Some experiment results are shown, which give an example of using MAP-numa to evaluate some optimizations based on Oprofile sampling.
2010
NUMA is becoming more widespread in the marketplace, used on many systems, small or large, particularly with the advent of AMD Opteron systems. This paper will cover a summary of the current state of NUMA, and future developments, encompassing the VM subsystem, scheduler, topology (CPU, memory, I/O layouts including complex non-uniform layouts), userspace interface APIs, and network and disk I/O locality. It will take a broad-based approach, focusing on the challenges of creating subsystems that work for all machines (including AMD64, PPC64, IA-32, IA-64, etc.), rather than just one architecture. 1 What is a NUMA machine? NUMA stands for non-uniform memory architecture. Typically this means that not all memory is the same “distance” from each CPU in the system, but also applies to other features such as I/O buses. The word “distance” in this context is generally used to refer to both latency and bandwidth. Typically, NUMA machines can access any resource in the system, just at diffe...
Definition NUMA is the acronym for Non-Uniform Memory Access. A NUMA cache is a cache memory in which the access time is not uniform but depends on the posi- tion of the involved block inside the cache. Among NUMA caches, it possible to distinguish: () the NUCA (Non-Uniform Cache Access) architectures, in which the memory space is deeply sub-banked, and the access latency depends on which sub-bank is accessed; and () the shared and distributed cache of a tiled Chip Mul- tiprocessor (CMP), in which the latency depends on which cache slice has to be accessed. Discussion David Padua (ed.), Encyclopedia of Parallel Computing, DOI ./----,
2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2016
Modern NUMA platforms offer large numbers of cores to boost performance through parallelism and multi-threading. However, because performance scalability is limited by available memory bandwidth, the strategy of allocating all cores can result in degraded performance. Consequently, accurately predicting optimal (best performing) core allocations, and executing applications with these allocations are crucial for achieving the best performance. Previous research focused on the prediction of optimal numbers of cores. However, in this paper, we show that, because of the asymmetric NUMA memory configuration and the asymmetric application memory behavior, optimal core allocations are not merely optimal numbers of cores. Additionally, previous studies do not adequately consider NUMA memory resources, which further limits their ability to accurately predict optimal core allocations. In this paper, we present a model, NuCore, which predicts both memory bandwidth usage and optimal core allocations. NuCore considers various memory resources and NUMA asymmetry, and employs Integer Programming to achieve high accuracy and low overhead. Experimental results from real NUMA machines show that the core allocations predicted by NuCore provide 1.27x average speedup over using all cores with only 75.6% cores allocated. Nu-Core also provides 1.18x and 1.21x average speedups over two state-of-the-art techniques. Our results also show that NuCore faithfully models NUMA memory systems and predicts memory bandwidth usages with only 10% average error.
Proceedings of the 29th conference on Winter simulation - WSC '97, 1997
We present a system for describing and solving closed queuing network models of the memory access performance of NUMA architectures. The system consists of a model description language, solver engines based upon both discrete event simulation and derivatives of the Mean Value Analysis (MVA) algorithm, and a model manager used to translate model descriptions to the forms required by the solvers.
Universidad Nacional de Educación , 2024
Revista de Derecho Ambiental, 2024
ANTHROPOMETRIC DATA OF PEOPLE WORKING IN THE PACKAGING OF SPARAGUS (sparagus) IN COSTA DE CABORCA, SONORA, MEXICO. (Atena Editora), 2024
Történelmi Szemle, 2023
International Journal of Social Imaginaries, 2023
Археологически открития и разкопки през 2015 г., 2016
20th Iranian Conference on Electrical Engineering (ICEE2012), 2012
Lecture Notes in Computer Science, 2006
Proceedings of the 12th International Workshop on Automating TEST Case Design, Selection, and Evaluation, 2021
TBV – Tijdschrift voor Bedrijfs- en Verzekeringsgeneeskunde, 2020
Abant Izzet Baysal University Graduate School of Social Sciences, 2011
Science Education, 2007
arXiv (Cornell University), 2014