Presented at LISA18: https://www.usenix.org/conference/lisa18/presentation/babrou
This is a technical dive into how we used eBPF to solve real-world issues uncovered during an innocent OS upgrade. We'll see how we debugged 10x CPU increase in Kafka after Debian upgrade and what lessons we learned. We'll get from high-level effects like increased CPU to flamegraphs showing us where the problem lies to tracing timers and functions calls in the Linux kernel.
The focus is on tools what operational engineers can use to debug performance issues in production. This particular issue happened at Cloudflare on a Kafka cluster doing 100Gbps of ingress and many multiple of that egress.
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
This presents the CephFS performance scalability and evaluation results. Specifically, it addresses some technical issues such as multi core scalability, cache size, static pinning, recovery, and QoS.
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
1. The document discusses implementing distributed mclock in Ceph for quality of service (QoS). It describes implementing QoS units at the pool, RBD image, and universal levels.
2. It covers inserting delta/rho/phase parameters into Ceph classes for distributed mclock. Issues addressed include number of shards and background I/O.
3. An outstanding I/O based adaptive throttle is introduced to suspend mclock scheduling if the I/O load is too high. Testing showed it effectively maintained maximum throughput.
4. Future plans include improving the mclock algorithm, extending QoS to individual RBDs, adding metrics, and testing in various environments. Collaboration with the community is
The document provides an overview of the Cisco CRS-1 router. It discusses the router's distributed architecture with multiple line cards and a high-speed fabric. The fabric uses a multi-stage Benes switch design to provide multiple paths between cards. Each line card contains specialized silicon that can process packets at wire speed through independent packet processing engines. The router is designed to scale routing capacity and features through this distributed and programmable hardware architecture.
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
Traditional virtualization technologies have been used by cloud infrastructure providers for many years in providing isolated environments for hosting applications. These technologies make use of full-blown operating system images for creating virtual machines (VMs). According to this architecture, each VM needs its own guest operating system to run application processes. More recently, with the introduction of the Docker project, the Linux Container (LXC) virtualization technology became popular and attracted the attention. Unlike VMs, containers do not need a dedicated guest operating system for providing OS-level isolation, rather they can provide the same level of isolation on top of a single operating system instance.
An enterprise application may need to run a server cluster to handle high request volumes. Running an entire server cluster on Docker containers, on a single Docker host could introduce the risk of single point of failure. Google started a project called Kubernetes to solve this problem. Kubernetes provides a cluster of Docker hosts for managing Docker containers in a clustered environment. It provides an API on top of Docker API for managing docker containers on multiple Docker hosts with many more features.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
레드햇 엔터프라이즈 리눅스 7 기반에 대한 운영자 가이드 기초편을 공유합니다.
부트로더 관리, 패키지, 네트워크, 스토리지 및 크래쉬 덤프 발생에 대한 관리까지 기초 운영 지식에 대한 부분을 본 문서를 통해 얻으실 수 있습니다.
오픈소스컨설팅의 문경윤차장께서 공유해주신 내용입니다.
Ceph is a open source , software defined storage excellent and the only ( i would say ) storage backend as a cloud storage. Ceph is the Future of Storage. In this presentation i am explaining ceph and openstack briefly , you would definitely enjoy it.
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
Philipp explains the best performing Open Source software-defined storage software available to Apache CloudStack today. It consists of two well-concerted components. LINSTOR and DRBD. Each of them also has its independent use cases, where it is deployed alone. In this presentation, the combination of these two is examined. They form the control plane and the data plane of the SDS. We will touch on: Performance, scalability, hyper-convergence (data-locality for high IO performance), resiliency through data replication (synchronous within a site, 2-way, 3-way, or more), snapshots, backup (to S3), encryption at rest, deduplication, compression, placement policies (regarding failure domains), management CLI and webGUI, monitoring interface, self-healing (restoring redundancy after device/node failure), the federation of multiple sites (async mirroring and repeatedly snapshot difference shipping), QoS control (noisy neighbors limitation) and of course: complete integration with CloudStack for KVM guests. It is Open Source software following the Unix philosophy. Each component solves one task, made for maximal re-usability. The solution leverages the Linux kernel, LVM and/or ZFS, and many Open Source software libraries. Building on these giant Open Source foundations, not only saves LINBIT from re-inventing the wheels, it also empowers your day 2 operation teams since they are already familiar with these technologies.
Philipp Reisner is one of the founders and CEO of LINBIT in Vienna/Austria. He holds a Dipl.-Ing. (comparable to MSc) degree in computer science from Technical University in Vienna. His professional career has been dominated by developing DRBD, a storage replication software for Linux. While in the early years (2001) this was writing kernel code, today he leads a company of 30 employees with locations in Austria and the USA. LINBIT is an Open Source company offering enterprise-level support subscriptions for its Open Source technologies.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Ceph is an open-source distributed storage system that provides object, block, and file storage. The document discusses optimizing Ceph for an all-flash configuration and analyzing performance issues when using Ceph on all-flash storage. It describes SK Telecom's testing of Ceph performance on VMs using all-flash SSDs and compares the results to a community Ceph version. SK Telecom also proposes their all-flash Ceph solution with custom hardware configurations and monitoring software.
Introduces Ansible as DevOps favorite choice for Configuration Management and Server Provisioning. Enables audience to get started with using Ansible. Developed in Python which only needs YAML syntax knowledge to automate using this tool.
The presentation helps to introduce the key aspects of the Oracle Optimizer and how you find out what it's up to and how you can influence its decisions.
Node management in Oracle Clusterware involves monitoring nodes and evicting nodes if necessary to prevent split-brain situations. The CSSD process monitors nodes through network heartbeats over the private interconnect and disk heartbeats using the voting disks. If a node fails to respond within the configured time limits for either heartbeat, it will be evicted from the cluster. Eviction involves sending a "kill request" to the node over the remaining communication channels to forcibly remove it. With Oracle Clusterware 11.2.0.2, reboots of nodes can be avoided by gracefully shutting down the Oracle Clusterware stack instead of an immediate reboot when fencing a node.
CRUSH is the powerful, highly configurable algorithm Red Hat Ceph Storage uses to determine how data is stored across the many servers in a cluster. A healthy Red Hat Ceph Storage deployment depends on a properly configured CRUSH map. In this session, we will review the Red Hat Ceph Storage architecture and explain the purpose of CRUSH. Using example CRUSH maps, we will show you what works and what does not, and explain why.
Presented at Red Hat Summit 2016-06-29.
The document discusses best practices for operating and supporting Apache HBase. It outlines tools like the HBase UI and HBCK that can be used to debug issues. The top categories of issues covered are region server stability problems, read/write performance, and inconsistencies. SmartSense is introduced as a tool that can help detect configuration issues proactively.
Ceph scale testing with 10 Billion ObjectsKaran Singh
In this performance testing, we ingested 10 Billion objects into the Ceph Object Storage system and measured its performance. We have observed deterministic performance, check out this presentation to know the details.
This document provides an overview of IT automation using Ansible. It discusses using Ansible to automate tasks across multiple servers like installing packages and copying files without needing to login to each server individually. It also covers Ansible concepts like playbooks, variables, modules, and vault for securely storing passwords. Playbooks allow defining automation jobs as code that can be run on multiple servers simultaneously in a consistent and repeatable way.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Session Description:
An early overview of the upcoming new and exciting features and improvements in the next major LTS release of CloudStack, 4.19. Abhishek Kumar, who will be acting as the release manager for the CloudStack 4.19, gives a quick recap of the major additions in the previous LTS release - 4.18.0, discusses the timeline for the 4.19.0 release and talks about the planned and expected new features in the upcoming release.
Speaker Bio:
Abhishek is a committer of the Apache CloudStack project and has worked on the notable features such as VM ingestion, CloudStack Kubernetes Service, IPv6 support, etc. He works as a Software Engineer at ShapeBlue.
---------------------------------------------
On Friday 18th August, the Apache CloudStack India User Group 2023 took place in Bangalore, seeing CloudStack enthusiasts, experts, and industry leaders from across the country, discuss the open-source project. The meetup served as a vibrant platform to delve into the depths of Apache CloudStack, share insights, and forge new connections.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
Запускаем сервер (БД, Web-сервер или что-то свое собственное) и не получаем желаемый RPS. Запускаем top и видим, что 100% выедается CPU. Что дальше, на что расходуется процессорное время? Можно ли подкрутить какие-то ручки, чтобы улучшить производительность? А если параметр CPU не высокий, то куда смотреть дальше?
Мы рассмотрим несколько сценариев проблем производительности, рассмотрим доступные инструменты анализа производительности и разберемся в методологии оптимизации производительности Linux, ответим на вопрос за какие ручки и как крутить.
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
This document provides an overview of Brendan Gregg's presentation on BPF performance analysis at Netflix. It discusses:
- Why BPF is changing the Linux OS model to become more event-based and microkernel-like.
- The internals of BPF including its origins, instruction set, execution model, and how it is integrated into the Linux kernel.
- How BPF enables a new class of custom, efficient, and safe performance analysis tools for analyzing various Linux subsystems like CPUs, memory, disks, networking, applications, and the kernel.
- Examples of specific BPF-based performance analysis tools developed by Netflix, AWS, and others for analyzing tasks, scheduling, page faults
Traditional virtualization technologies have been used by cloud infrastructure providers for many years in providing isolated environments for hosting applications. These technologies make use of full-blown operating system images for creating virtual machines (VMs). According to this architecture, each VM needs its own guest operating system to run application processes. More recently, with the introduction of the Docker project, the Linux Container (LXC) virtualization technology became popular and attracted the attention. Unlike VMs, containers do not need a dedicated guest operating system for providing OS-level isolation, rather they can provide the same level of isolation on top of a single operating system instance.
An enterprise application may need to run a server cluster to handle high request volumes. Running an entire server cluster on Docker containers, on a single Docker host could introduce the risk of single point of failure. Google started a project called Kubernetes to solve this problem. Kubernetes provides a cluster of Docker hosts for managing Docker containers in a clustered environment. It provides an API on top of Docker API for managing docker containers on multiple Docker hosts with many more features.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
레드햇 엔터프라이즈 리눅스 7 기반에 대한 운영자 가이드 기초편을 공유합니다.
부트로더 관리, 패키지, 네트워크, 스토리지 및 크래쉬 덤프 발생에 대한 관리까지 기초 운영 지식에 대한 부분을 본 문서를 통해 얻으실 수 있습니다.
오픈소스컨설팅의 문경윤차장께서 공유해주신 내용입니다.
Ceph is a open source , software defined storage excellent and the only ( i would say ) storage backend as a cloud storage. Ceph is the Future of Storage. In this presentation i am explaining ceph and openstack briefly , you would definitely enjoy it.
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
Philipp explains the best performing Open Source software-defined storage software available to Apache CloudStack today. It consists of two well-concerted components. LINSTOR and DRBD. Each of them also has its independent use cases, where it is deployed alone. In this presentation, the combination of these two is examined. They form the control plane and the data plane of the SDS. We will touch on: Performance, scalability, hyper-convergence (data-locality for high IO performance), resiliency through data replication (synchronous within a site, 2-way, 3-way, or more), snapshots, backup (to S3), encryption at rest, deduplication, compression, placement policies (regarding failure domains), management CLI and webGUI, monitoring interface, self-healing (restoring redundancy after device/node failure), the federation of multiple sites (async mirroring and repeatedly snapshot difference shipping), QoS control (noisy neighbors limitation) and of course: complete integration with CloudStack for KVM guests. It is Open Source software following the Unix philosophy. Each component solves one task, made for maximal re-usability. The solution leverages the Linux kernel, LVM and/or ZFS, and many Open Source software libraries. Building on these giant Open Source foundations, not only saves LINBIT from re-inventing the wheels, it also empowers your day 2 operation teams since they are already familiar with these technologies.
Philipp Reisner is one of the founders and CEO of LINBIT in Vienna/Austria. He holds a Dipl.-Ing. (comparable to MSc) degree in computer science from Technical University in Vienna. His professional career has been dominated by developing DRBD, a storage replication software for Linux. While in the early years (2001) this was writing kernel code, today he leads a company of 30 employees with locations in Austria and the USA. LINBIT is an Open Source company offering enterprise-level support subscriptions for its Open Source technologies.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Ceph is an open-source distributed storage system that provides object, block, and file storage. The document discusses optimizing Ceph for an all-flash configuration and analyzing performance issues when using Ceph on all-flash storage. It describes SK Telecom's testing of Ceph performance on VMs using all-flash SSDs and compares the results to a community Ceph version. SK Telecom also proposes their all-flash Ceph solution with custom hardware configurations and monitoring software.
Introduces Ansible as DevOps favorite choice for Configuration Management and Server Provisioning. Enables audience to get started with using Ansible. Developed in Python which only needs YAML syntax knowledge to automate using this tool.
The presentation helps to introduce the key aspects of the Oracle Optimizer and how you find out what it's up to and how you can influence its decisions.
Node management in Oracle Clusterware involves monitoring nodes and evicting nodes if necessary to prevent split-brain situations. The CSSD process monitors nodes through network heartbeats over the private interconnect and disk heartbeats using the voting disks. If a node fails to respond within the configured time limits for either heartbeat, it will be evicted from the cluster. Eviction involves sending a "kill request" to the node over the remaining communication channels to forcibly remove it. With Oracle Clusterware 11.2.0.2, reboots of nodes can be avoided by gracefully shutting down the Oracle Clusterware stack instead of an immediate reboot when fencing a node.
CRUSH is the powerful, highly configurable algorithm Red Hat Ceph Storage uses to determine how data is stored across the many servers in a cluster. A healthy Red Hat Ceph Storage deployment depends on a properly configured CRUSH map. In this session, we will review the Red Hat Ceph Storage architecture and explain the purpose of CRUSH. Using example CRUSH maps, we will show you what works and what does not, and explain why.
Presented at Red Hat Summit 2016-06-29.
The document discusses best practices for operating and supporting Apache HBase. It outlines tools like the HBase UI and HBCK that can be used to debug issues. The top categories of issues covered are region server stability problems, read/write performance, and inconsistencies. SmartSense is introduced as a tool that can help detect configuration issues proactively.
Ceph scale testing with 10 Billion ObjectsKaran Singh
In this performance testing, we ingested 10 Billion objects into the Ceph Object Storage system and measured its performance. We have observed deterministic performance, check out this presentation to know the details.
This document provides an overview of IT automation using Ansible. It discusses using Ansible to automate tasks across multiple servers like installing packages and copying files without needing to login to each server individually. It also covers Ansible concepts like playbooks, variables, modules, and vault for securely storing passwords. Playbooks allow defining automation jobs as code that can be run on multiple servers simultaneously in a consistent and repeatable way.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Session Description:
An early overview of the upcoming new and exciting features and improvements in the next major LTS release of CloudStack, 4.19. Abhishek Kumar, who will be acting as the release manager for the CloudStack 4.19, gives a quick recap of the major additions in the previous LTS release - 4.18.0, discusses the timeline for the 4.19.0 release and talks about the planned and expected new features in the upcoming release.
Speaker Bio:
Abhishek is a committer of the Apache CloudStack project and has worked on the notable features such as VM ingestion, CloudStack Kubernetes Service, IPv6 support, etc. He works as a Software Engineer at ShapeBlue.
---------------------------------------------
On Friday 18th August, the Apache CloudStack India User Group 2023 took place in Bangalore, seeing CloudStack enthusiasts, experts, and industry leaders from across the country, discuss the open-source project. The meetup served as a vibrant platform to delve into the depths of Apache CloudStack, share insights, and forge new connections.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
Запускаем сервер (БД, Web-сервер или что-то свое собственное) и не получаем желаемый RPS. Запускаем top и видим, что 100% выедается CPU. Что дальше, на что расходуется процессорное время? Можно ли подкрутить какие-то ручки, чтобы улучшить производительность? А если параметр CPU не высокий, то куда смотреть дальше?
Мы рассмотрим несколько сценариев проблем производительности, рассмотрим доступные инструменты анализа производительности и разберемся в методологии оптимизации производительности Linux, ответим на вопрос за какие ручки и как крутить.
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
This document provides an overview of Brendan Gregg's presentation on BPF performance analysis at Netflix. It discusses:
- Why BPF is changing the Linux OS model to become more event-based and microkernel-like.
- The internals of BPF including its origins, instruction set, execution model, and how it is integrated into the Linux kernel.
- How BPF enables a new class of custom, efficient, and safe performance analysis tools for analyzing various Linux subsystems like CPUs, memory, disks, networking, applications, and the kernel.
- Examples of specific BPF-based performance analysis tools developed by Netflix, AWS, and others for analyzing tasks, scheduling, page faults
The document discusses reverse engineering the firmware of Swisscom's Centro Grande modems. It identifies several vulnerabilities found, including a command overflow issue that allows complete control of the device by exceeding the input buffer, and multiple buffer overflow issues that can be exploited to execute code remotely by crafting specially formatted XML files. Details are provided on the exploitation techniques and timeline of coordination with Swisscom to address the vulnerabilities.
The document discusses diagnosing and mitigating MySQL performance issues. It describes using various operating system monitoring tools like vmstat, iostat, and top to analyze CPU, memory, disk, and network utilization. It also discusses using MySQL-specific tools like the MySQL command line, mysqladmin, mysqlbinlog, and external tools to diagnose issues like high load, I/O wait, or slow queries by examining metrics like queries, connections, storage engine statistics, and InnoDB logs and data written. The agenda covers identifying system and MySQL-specific bottlenecks by verifying OS metrics and running diagnostics on the database, storage engines, configuration, and queries.
- The document discusses various Linux system log files such as /var/log/messages, /var/log/secure, and /var/log/cron and provides examples of log entries.
- It also covers log rotation tools like logrotate and logwatch that are used to manage log files.
- Networking topics like IP addressing, subnet masking, routing, ARP, and tcpdump for packet sniffing are explained along with examples.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
Handy Networking Tools and How to Use ThemSneha Inguva
Linux networking tools can be used to analyze network connectivity and performance. Tools like ifconfig show interface configurations, route displays routing tables, arp shows the ARP cache, dig/nslookup resolve DNS, and traceroute traces the network path. Nmap scans for open ports, ping checks latency, and tcpdump captures traffic. Iperf3 and wrk2 can load test throughput and capacity, while tcpreplay replays captured traffic. These CLI tools provide essential network information and testing capabilities from the command line.
This document provides an overview of Linux performance monitoring tools including mpstat, top, htop, vmstat, iostat, free, strace, and tcpdump. It discusses what each tool measures and how to use it to observe system performance and diagnose issues. The tools presented provide visibility into CPU usage, memory usage, disk I/O, network traffic, and system call activity which are essential for understanding workload performance on Linux systems.
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
Talk by Brendan Gregg for OSSNA 2017. "Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will be a dive deep on these new tracing, observability, and debugging capabilities, which sooner or later will be available to everyone who uses Linux. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Anne Nicolas
The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more.
Brendan Gregg, Netflix
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
Talk by Brendan Gregg for All Things Open 2018. "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features,
for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability and the new open source tools that use it, Kyber for disk I/O sc
heduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the
most out of your systems with the latest Linux kernels and exciting features."
QUIC is a new transport protocol developed by Google to replace TCP+TLS. It aims to reduce latency by eliminating OSI layers and supporting features like 0-RTT handshakes. The document provides a high-level overview of QUIC including its architecture, use of TLS 1.3, streams for multiplexing data, and support for features like connection migration through the use of connection IDs. It also discusses QUIC's current implementation status and adoption. Examples are given of QUIC packets and the handshake process.
This document provides information on various debugging and profiling tools that can be used for Ruby including:
- lsof to list open files for a process
- strace to trace system calls and signals
- tcpdump to dump network traffic
- google perftools profiler for CPU profiling
- pprof to analyze profiling data
It also discusses how some of these tools have helped identify specific performance issues with Ruby like excessive calls to sigprocmask and memcpy calls slowing down EventMachine with threads.
USENIX ATC 2017 Performance Superpowers with Enhanced BPFBrendan Gregg
Talk for USENIX ATC 2017 by Brendan Gregg
"The Berkeley Packet Filter (BPF) in Linux has been enhanced in very recent versions to do much more than just filter packets, and has become a hot area of operating systems innovation, with much more yet to be discovered. BPF is a sandboxed virtual machine that runs user-level defined programs in kernel context, and is part of many kernels. The Linux enhancements allow it to run custom programs on other events, including kernel- and user-level dynamic tracing (kprobes and uprobes), static tracing (tracepoints), and hardware events. This is finding uses for the generation of new performance analysis tools, network acceleration technologies, and security intrusion detection systems.
This talk will explain the BPF enhancements, then discuss the new performance observability tools that are in use and being created, especially from the BPF compiler collection (bcc) open source project. These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and much more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations.
Because these BPF enhancements are only in very recent Linux (such as Linux 4.9), most companies are not yet running new enough kernels to be exploring BPF yet. This will change in the next year or two, as companies including Netflix upgrade their kernels. This talk will give you a head start on this growing technology, and also discuss areas of future work and unsolved problems."
Varnish is an HTTP accelerator that acts as a reverse proxy and cache. It is very fast due to being open source and outsourcing tasks to kernel functions. It relies on a massively multithreaded architecture that is partly event driven. It maps the cache store into memory using mmap and writes directly from mapped memory for maximum performance. Logging includes all request headers. Wikia uses Varnish across 4 datacenters with rapid cache invalidations and a RabbitMQ queue to handle invalidations. SSDs and tuning help optimize performance.
UiPath Document Understanding - Generative AI and Active learning capabilitiesDianaGray10
This session focus on Generative AI features and Active learning modern experience with Document understanding.
Topics Covered:
Overview of Document Understanding
How Generative Annotation works?
What is Generative Classification?
How to use Generative Extraction activities?
What is Generative Validation?
How Active learning modern experience accelerate model training?
Q/A
❓ If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
This is session #4 of the 5-session online study series with Google Cloud, where we take you onto the journey learning generative AI. You’ll explore the dynamic landscape of Generative AI, gaining both theoretical insights and practical know-how of Google Cloud GenAI tools such as Gemini, Vertex AI, AI agents and Imagen 3.
Gojek Clone is a versatile multi-service super app that offers ride-hailing, food delivery, payment services, and more, providing a seamless experience for users and businesses alike on a single platform.
A Framework for Model-Driven Digital Twin EngineeringDaniel Lehner
Slides from my PhD Defense at Johannes Kepler University, held on Janurary 10, 2025.
The full thesis is available here: https://epub.jku.at/urn/urn:nbn:at:at-ubl:1-83896
TrustArc Webinar - Building your DPIA/PIA Program: Best Practices & TipsTrustArc
Understanding DPIA/PIAs and how to implement them can be the key to embedding privacy in the heart of your organization as well as achieving compliance with multiple data protection / privacy laws, such as GDPR and CCPA. Indeed, the GDPR mandates Privacy by Design and requires documented Data Protection Impact Assessments (DPIAs) for high risk processing and the EU AI Act requires an assessment of fundamental rights.
How can you build this into a sustainable program across your business? What are the similarities and differences between PIAs and DPIAs? What are the best practices for integrating PIAs/DPIAs into your data privacy processes?
Whether you're refining your compliance framework or looking to enhance your PIA/DPIA execution, this session will provide actionable insights and strategies to ensure your organization meets the highest standards of data protection.
Join our panel of privacy experts as we explore:
- DPIA & PIA best practices
- Key regulatory requirements for conducting PIAs and DPIAs
- How to identify and mitigate data privacy risks through comprehensive assessments
- Strategies for ensuring documentation and compliance are robust and defensible
- Real-world case studies that highlight common pitfalls and practical solutions
Fl studio crack version 12.9 Free Downloadkherorpacca127
Google the copied link 👉🏻👉🏻 https://activationskey.com/download-latest-setup/
👈🏻👈🏻
The ultimate guide to FL Studio 12.9 Crack, the revolutionary digital audio workstation that empowers musicians and producers of all levels. This software has become a cornerstone in the music industry, offering unparalleled creative capabilities, cutting-edge features, and an intuitive workflow.
With FL Studio 12.9 Crack, you gain access to a vast arsenal of instruments, effects, and plugins, seamlessly integrated into a user-friendly interface. Its signature Piano Roll Editor provides an exceptional level of musical expression, while the advanced automation features empower you to create complex and dynamic compositions.
🌐 𝗢𝗦𝗪𝗔𝗡 𝗦𝘂𝗰𝗰𝗲𝘀𝘀 𝗦𝘁𝗼𝗿𝘆 🚀
𝗢𝗺𝗻𝗶𝗹𝗶𝗻𝗸 𝗧𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆 is proud to be a part of the 𝗢𝗱𝗶𝘀𝗵𝗮 𝗦𝘁𝗮𝘁𝗲 𝗪𝗶𝗱𝗲 𝗔𝗿𝗲𝗮 𝗡𝗲𝘁𝘄𝗼𝗿𝗸 (𝗢𝗦𝗪𝗔𝗡) success story! By delivering seamless, secure, and high-speed connectivity, OSWAN has revolutionized e-𝗚𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲 𝗶𝗻 𝗢𝗱𝗶𝘀𝗵𝗮, enabling efficient communication between government departments and enhancing citizen services.
Through our innovative solutions, 𝗢𝗺𝗻𝗶𝗹𝗶𝗻𝗸 𝗧𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆 has contributed to making governance smarter, faster, and more transparent. This milestone reflects our commitment to driving digital transformation and empowering communities.
📡 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗻𝗴 𝗢𝗱𝗶𝘀𝗵𝗮, 𝗘𝗺𝗽𝗼𝘄𝗲𝗿𝗶𝗻𝗴 𝗚𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲!
What Makes "Deep Research"? A Dive into AI AgentsZilliz
About this webinar:
Unless you live under a rock, you will have heard about OpenAI’s release of Deep Research on Feb 2, 2025. This new product promises to revolutionize how we answer questions requiring the synthesis of large amounts of diverse information. But how does this technology work, and why is Deep Research a noticeable improvement over previous attempts? In this webinar, we will examine the concepts underpinning modern agents using our basic clone, Deep Searcher, as an example.
Topics covered:
Tool use
Structured output
Reflection
Reasoning models
Planning
Types of agentic memory
Technology use over time and its impact on consumers and businesses.pptxkaylagaze
In this presentation, I explore how technology has changed consumer behaviour and its impact on consumers and businesses. I will focus on internet access, digital devices, how customers search for information and what they buy online, video consumption, and lastly consumer trends.
UiPath Automation Developer Associate Training Series 2025 - Session 1DianaGray10
Welcome to UiPath Automation Developer Associate Training Series 2025 - Session 1.
In this session, we will cover the following topics:
Introduction to RPA & UiPath Studio
Overview of RPA and its applications
Introduction to UiPath Studio
Variables & Data Types
Control Flows
You are requested to finish the following self-paced training for this session:
Variables, Constants and Arguments in Studio 2 modules - 1h 30m - https://academy.uipath.com/courses/variables-constants-and-arguments-in-studio
Control Flow in Studio 2 modules - 2h 15m - https:/academy.uipath.com/courses/control-flow-in-studio
⁉️ For any questions you may have, please use the dedicated Forum thread. You can tag the hosts and mentors directly and they will reply as soon as possible.
Backstage Software Templates for Java DevelopersMarkus Eisele
As a Java developer you might have a hard time accepting the limitations that you feel being introduced into your development cycles. Let's look at the positives and learn everything important to know to turn Backstag's software templates into a helpful tool you can use to elevate the platform experience for all developers.
UiPath Agentic Automation Capabilities and OpportunitiesDianaGray10
Learn what UiPath Agentic Automation capabilities are and how you can empower your agents with dynamic decision making. In this session we will cover these topics:
What do we mean by Agents
Components of Agents
Agentic Automation capabilities
What Agentic automation delivers and AI Tools
Identifying Agent opportunities
❓ If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
UiPath Automation Developer Associate Training Series 2025 - Session 2DianaGray10
In session 2, we will introduce you to Data manipulation in UiPath Studio.
Topics covered:
Data Manipulation
What is Data Manipulation
Strings
Lists
Dictionaries
RegEx Builder
Date and Time
Required Self-Paced Learning for this session:
Data Manipulation with Strings in UiPath Studio (v2022.10) 2 modules - 1h 30m - https://academy.uipath.com/courses/data-manipulation-with-strings-in-studio
Data Manipulation with Lists and Dictionaries in UiPath Studio (v2022.10) 2 modules - 1h - https:/academy.uipath.com/courses/data-manipulation-with-lists-and-dictionaries-in-studio
Data Manipulation with Data Tables in UiPath Studio (v2022.10) 2 modules - 1h 30m - https:/academy.uipath.com/courses/data-manipulation-with-data-tables-in-studio
⁉️ For any questions you may have, please use the dedicated Forum thread. You can tag the hosts and mentors directly and they will reply as soon as possible.
Computational Photography: How Technology is Changing Way We Capture the WorldHusseinMalikMammadli
📸 Computational Photography (Computer Vision/Image): How Technology is Changing the Way We Capture the World
Heç düşünmüsünüzmü, müasir smartfonlar və kameralar necə bu qədər gözəl görüntülər yaradır? Bunun sirri Computational Fotoqrafiyasında(Computer Vision/Imaging) gizlidir—şəkilləri çəkmə və emal etmə üsulumuzu təkmilləşdirən, kompüter elmi ilə fotoqrafiyanın inqilabi birləşməsi.
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIASrivaanchi Nathan
This business intelligence report, "The Big Ten Biopharmaceutical MNCs: Global Capability Centers in India", provides an in-depth analysis of the operations and contributions of the Global Capability Centers (GCCs) of ten leading biopharmaceutical multinational corporations in India. The report covers AstraZeneca, Bayer, Bristol Myers Squibb, GlaxoSmithKline (GSK), Novartis, Sanofi, Roche, Pfizer, Novo Nordisk, and Eli Lilly. In this report each company's GCC is profiled with details on location, workforce size, investment, and the strategic roles these centers play in global business operations, research and development, and information technology and digital innovation.
3. What does Cloudflare do
CDN
Moving content physically
closer to visitors with
our CDN.
Intelligent caching
Unlimited DDOS
mitigation
Unlimited bandwidth at
flat pricing with free
plans
Edge access control
IPFS gateway
Onion service
Website Optimization
Making web fast and up to
date for everyone.
TLS 1.3 (with 0-RTT)
HTTP/2 + QUIC
Server push
AMP
Origin load-balancing
Smart routing
Serverless / Edge Workers
Post quantum crypto
DNS
Cloudflare is the fastest
managed DNS providers
in the world.
1.1.1.1
2606:4700:4700::1111
DNS over TLS
4. 160+
Data centers globally
4.5M+
DNS requests/s
across authoritative, recursive
and internal
10%
Internet requests
everyday
10M+
HTTP requests/second
Websites, apps & APIs
in 150 countries
10M+
Cloudflare’s anycast network
Network capacity
20Tbps
6. Link to slides with speaker notes
Slideshare doesn’t allow links on the first 3 slides
7. Cloudflare is a Debian shop
● All machines were running Debian Jessie on bare metal
● OS boots over PXE into memory, packages and configs are ephemeral
● Kernel can be swapped as easy as OS
● New Stable (stretch) came out, we wanted to keep up
● Very easy to upgrade:
○ Build all packages for both distributions
○ Upgrade machines in groups, look at metrics, fix issues, repeat
○ Gradually phase out Jessie
○ Pop a bottle of champagne and celebrate
8. Cloudflare core Kafka platform at the time
● Kafka is a distributed log with multiple producers and consumers
● 3 clusters: 2 small (dns + logs) with 9 nodes, 1 big (http) with 106 nodes
● 2 x 10C Intel Xeon E5-2630 v4 @ 2.2GHz (40 logical CPUs), 128GB RAM
● 12 x 800GB SSD in RAID0
● 2 x 10G bonded NIC
● Mostly network bound at ~100Gbps ingress and ~700Gbps egress
● Check out our blog post on Kafka compression
● We also blogged about our Gen 9 edge machines recently
11. RCU stalls in dmesg
[ 4923.462841] INFO: rcu_sched self-detected stall on CPU
[ 4923.462843] 13-...: (2 GPs behind) idle=ea7/140000000000001/0 softirq=1/2 fqs=4198
[ 4923.462845] (t=8403 jiffies g=110722 c=110721 q=6440)
12. Error logging issues
Aug 15 21:51:35 myhost kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
Aug 15 21:51:35 myhost kernel: 26-...: (1881 ticks this GP) idle=76f/140000000000000/0
softirq=8/8 fqs=365
Aug 15 21:51:35 myhost kernel: (detected by 0, t=2102 jiffies, g=1837293, c=1837292, q=262)
Aug 15 21:51:35 myhost kernel: Task dump for CPU 26:
Aug 15 21:51:35 myhost kernel: java R running task 13488 1714 1513 0x00080188
Aug 15 21:51:35 myhost kernel: ffffc9000d1f7898 ffffffff814ee977 ffff88103f410400 000000000000000a
Aug 15 21:51:35 myhost kernel: 0000000000000041 ffffffff82203142 ffffc9000d1f78c0 ffffffff814eea10
Aug 15 21:51:35 myhost kernel: 0000000000000041 ffffffff82203142 ffff88103f410400 ffffc9000d1f7920
Aug 15 21:51:35 myhost kernel: Call Trace:
Aug 15 21:51:35 myhost kernel: [<ffffffff814ee977>] ? scrup+0x147/0x160
Aug 15 21:51:35 myhost kernel: [<ffffffff814eea10>] ? lf+0x80/0x90
Aug 15 21:51:35 myhost kernel: [<ffffffff814eecb5>] ? vt_console_print+0x295/0x3c0
13. Page allocation failures
Aug 16 01:14:51 myhost systemd-journald[13812]: Missed 17171 kernel messages
Aug 16 01:14:51 myhost kernel: [<ffffffff81171754>] shrink_inactive_list+0x1f4/0x4f0
Aug 16 01:14:51 myhost kernel: [<ffffffff8117234b>] shrink_node_memcg+0x5bb/0x780
Aug 16 01:14:51 myhost kernel: [<ffffffff811725e2>] shrink_node+0xd2/0x2f0
Aug 16 01:14:51 myhost kernel: [<ffffffff811728ef>] do_try_to_free_pages+0xef/0x310
Aug 16 01:14:51 myhost kernel: [<ffffffff81172be5>] try_to_free_pages+0xd5/0x180
Aug 16 01:14:51 myhost kernel: [<ffffffff811632db>] __alloc_pages_slowpath+0x31b/0xb80
...
[78991.546088] systemd-network: page allocation stalls for 287000ms, order:0,
mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
14. Downgrade and investigate
● System CPU was up, so it must be the kernel upgrade
● Downgrade Stretch to Jessie
● Downgrade Linux 4.9 to 4.4 (known good, but no allocation stall logging)
● Investigate without affecting customers
● Bisection pointed at OS upgrade, kernel was not responsible
15. Make a flamegraph with perf
#!/bin/sh -e
# flamegraph-perf [perf args here] > flamegraph.svg
# Explicitly setting output and input to perf.data is needed to make perf work over ssh without TTY.
perf record -o perf.data "$@"
# Fetch JVM stack maps if possible, this requires -XX:+PreserveFramePointer
export JAVA_HOME=/usr/lib/jvm/oracle-java8-jdk-amd64 AGENT_HOME=/usr/local/perf-map-agent
/usr/local/flamegraph/jmaps 1>&2
IDLE_REGEXPS="^swapper;.*(cpuidle|cpu_idle|cpu_bringup_and_idle|native_safe_halt|xen_hypercall_sched_op|x
en_hypercall_vcpu_op)"
perf script -i perf.data | /usr/local/flamegraph/stackcollapse-perf.pl --all grep -E -v "$IDLE_REGEXPS" |
/usr/local/flamegraph/flamegraph.pl --colors=java --hash --title=$(hostname)
30. Diff of the most popular stack
--- jessie.txt 2017-08-16 21:14:13.000000000 -0700
+++ stretch.txt 2017-08-16 21:14:20.000000000 -0700
@@ -1,4 +1,9 @@
tcp_push_one
+inet_sendmsg
+sock_sendmsg
+kernel_sendmsg
+sock_no_sendpage
+tcp_sendpage
inet_sendpage
kernel_sendpage
sock_sendpage
31. Let’s look at tcp_sendpage
int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) {
ssize_t res;
if (!(sk->sk_route_caps & NETIF_F_SG) ||
!sk_check_csum_caps(sk))
return sock_no_sendpage(sk->sk_socket, page, offset, size,
flags);
lock_sock(sk);
tcp_rate_check_app_limited(sk); /* is sending application-limited? */
res = do_tcp_sendpages(sk, page, offset, size, flags);
release_sock(sk);
return res;
}
what we see on the stack
segmentation offload
34. Compare ethtool -k settings on vlan10
-tx-checksumming: off
+tx-checksumming: on
- tx-checksum-ip-generic: off
+ tx-checksum-ip-generic: on
-scatter-gather: off
- tx-scatter-gather: off
+scatter-gather: on
+ tx-scatter-gather: on
-tcp-segmentation-offload: off
- tx-tcp-segmentation: off [requested on]
- tx-tcp-ecn-segmentation: off [requested on]
- tx-tcp-mangleid-segmentation: off [requested on]
- tx-tcp6-segmentation: off [requested on]
-udp-fragmentation-offload: off [requested on]
-generic-segmentation-offload: off [requested on]
+tcp-segmentation-offload: on
+ tx-tcp-segmentation: on
+ tx-tcp-ecn-segmentation: on
+ tx-tcp-mangleid-segmentation: on
+ tx-tcp6-segmentation: on
+udp-fragmentation-offload: on
+generic-segmentation-offload: on
35. Ha! Easy fix, let’s just enable it:
$ sudo ethtool -K vlan10 sg on
Actual changes:
tx-checksumming: on
tx-checksum-ip-generic: on
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: on
40. Lessons learned
● It’s important to pay closer attention and seemingly unrelated metrics
● Linux kernel can be easily traced with perf and bcc tools
○ Tools work out of the box
○ You don’t have to be a developer
● TCP offload is incredibly important and applies to vlan interfaces
● Switching OS on reboot proved to be useful
41. But really it was just an excuse
● Internal blog post about this is from Aug 2017
● External blog post in Cloudflare blog is from May 2018
● All to show where ebpf_exporter can be useful
○ Our tool to export hidden kernel metrics with eBPF
○ Can trace any kernel function and hardware counters
○ IO latency histograms, timer counters, TCP retransmits, etc.
○ Exports data in Prometheus (OpenMetrics) format
42. Can be nicely visualized with new Grafana
Disk upgrade in production
43. Thank you
● Blog post this talk is based on
● Github for ebpf_exporter: https://github.com/cloudflare/ebpf_exporter
● Slides for ebpf_exporter talk with presenter notes (and a blog post)
○ Disclaimer: contains statistical dinosaur gifs
● Training on ebpf_exporter with Alexander Huynh
○ Look for “Hidden Linux Metrics with Prometheus eBPF Exporter”
○ Wednesday, Oct 31st, 11:45 - 12:30, Cumberland room 3-4
● We’re hiring
Ivan on twitter: @ibobrik
Editor's Notes
#2: Hello,
Today we’re going to go through one production issue from start to finish and see how we can apply dynamic tracing to get to the bottom of the problem.
#3: My name is Ivan and I work for a company called Cloudflare, where I focus on performance and efficiency of our products.
#4: To give you some context, thise are some key areas Cloudflare specializes in.
In addition to being a good old CDN service with free unlimited DDOS protection, we try to be at the front of innovation with technologies like TLS v1.3, QUIC and edge workers, making internet faster and and more secure for end users and website owners.
We’re also the fastest authoritative and recursive DNS provider. Our resolver 1.1.1.1 is privacy oriented and supports things like DNS over TLS, stipping intermediates from knowing your DNS requests, not to mention DNSSEC.
If you have a website of any size, you should totally put this behind Cloudflare.
#5: Here are some numbers to give you an idea of the scale we operate on.
We have 160 datacenters around the world and plan to grow to at least 200 next year.
At peak these datacenters process more than 10 million HTTP requests per second. At the same time the very same datacenters serve 4.5 million DNS requests per second across internal and external DNS.
That’s a lot of data to analyze and we collect logs into core datacenters for processing and analytics.
#6: I often get frustrated when people show numbers that are not scaled to seconds. I figured I cannot win them, so I may as well just join them
Here you see numbers per day. My favorite one is network capaity, which is 1.73 exabytes per day.
As you can see, these numbers make no sense. It gets even weirder when different metrics are scaled to different time units.
Please don’t use this as a reference, always scale down to second.
#8: Now to set a scene for this talk specifically, it makes sense to tell a little on our hardware and software stack.
All machines serving traffic and doing backend analytics are bare metal servers running Debian, at that point in time we were running Jessie.
We’re big fans of ephemeral stuff and not a single machine has OS installed on persistent storage. Instead, we boot from a minimal immutable initramfs from network and install all packages and configs on top of that into ramfs with configuration management system. This means that on reboot every machine is clean and OS and kernel can be swapped with just a reboot.
And the story starts with my personal desire to update Debian to the latest Stable release, which was Stretch at that time.
Our plan for this upgrade was quite simple because of our setup. We can just build all necessary packages for both distributions, switch some group of machines into Stretch, fix what’s broken and carry on to the next group of machines. No need to wipe disks, reinstall anything or deal with dependency issues. We even only needed to build just one OS image as opposed to one image per workload.
On the edge every machine is the same, so that part was trivial. In core datacenters where backend out of band processing happens we have different machines doing different workloads, which means we have a more diverse set of metrics to look at, but we can also switch some groups completely faster.
#9: One of such groups was a set of our Kafka clusters. If you’re not familiar with Kafka, it’s basically a distributed log system. Multiple producers append messages to topics and then multiple consumers read those logs. For the most part we’re using it as a queue with a large on-disk buffer that can get us time to fix issues in consumers without losing data.
We have three major clusters: DNS and Logs are small with just 9 nodes each, and HTTP is massive with 106 nodes.
You can see the specs for HTTP cluster at that time on the slides: 128GB of RAM and two Broadwell Xeon CPUs in NUMA setup with 40 logical CPUs.
We opted out for 12 SSDs in RAID0 to prevent IO trashing from consumers falling out of page cache. Disk level redundancy is absent in favor of larger usable disk space and higher throughput, we rely on 3x replication instead.
In terms of network we had 2x10G NIC in bonded setup for maximum network throughput. It was not intended to provide any redundancy.
We used to have a lot of issues with being network bound, but in the end that was solved by aggressive compression with zstd. Funnily enough, we also opted out to have 2x25G NICs, just because they are cheaper, even though we are not network bound anymore.
Check out our blog post about Kafka compression or a recent one about Gen 9 edge servers if you want to learn more.
#10: So we did our upgrade on small Kafka clusters and it went pretty well, at least nobody said anything and user facing metrics looked good. If you were listening to talks yesterday, that’s what apparently should be alerted on, so no alerts fired.
On the big HTTP cluster, however, we started seeing issues with consumers timing out and lagging, so we looked closer at the metrics we had.
And this is what we saw: one upgraded node was using a lot more CPU than before, 5x more in fact. By itself this is not as big of an issue, you can see that we’re not stressing out CPUs that much. Typica Kafka CPU usage before this upgrade was around 3 logical CPUs out of 40, which leaves a lot of room.
Still, having 5x CPU usage was definitely an unexpected outcome. For control datapoints, we compared the problematic machine to another machine where no upgrade happened, and an intermediary node that received a full software stack upgrade on reboot, but not an OS upgrade, which we optimistically bundled with a minor kernel upgrade. Neither of these two nodes experienced the same CPU saturation issues, even though their setups were practically identical.
#11: For debugging CPU saturation issues, we depend on linux perf command to find the cause. It’s included with the kernel and on end user distributions you can install it with package like linux-base or something.
The first question that comes to mind when we see CPU saturation issues is what is using the CPU. In tools like top we can see what processes occupy CPU, but with perf you can see which functions inside these processes sit on CPU the most. This covers kernel and user space for well behaved programs that have a way to decode stacks. That includes C/C++ with frame pointers and Go.
Here you can see top-like output from perf with the most expensive functions in terms of CPU time. Sorting is a bit confusing, because it sorts by inclusive time, but we’re mostly interested in “self” column, which shows how often the very tip of the stack is on CPU. In this case most of the time is taken by some spinlock slowpath.
Spinlocks in the kernel exist to protect critical sections from concurrent access. There are two reasons to use them:
* Critical section is small and is not contended
* Lock owner cannot sleep (like interrupts cannot do that)
If spinlock cannot be acquired, caller burns CPU until it can get hold of the lock. While it may sound like a questionable idea at first, there are legitimate uses for this mechanism.
In our situation it seems like spinlock is really contended and half of CPU cycles are not doing useful work.
We don’t know what lock is causing this to happen from this output, however.
There were also other symptoms, so let’s look at them first.
#12: If anything bad happens in production, it’s always a good idea to have a look at dmesg. Messages there can be cryptic, but they can at least point you in the right direction. Fixing an issue is 95% knowing where to find the issue.
In that particular case we saw RCU stalls, where RCU stands for read-copy-update. I’m not exactly an expert in this, but it sounds like another synchronization mechanism and it can be affected by spinlocks we saw before.
We've seen rare RCU stalls before, and our (suboptimal) solution was to reboot the machine if no other issues can be found. 99% of the time reboot fixed the issue for a long time.
However, one can only handle so many reboots before the problem becomes severe enough to warrant a deep dive. In this case we had other clues.
#13: While looking deeper into dmesg, we noticed issues around writing messages to the console.
This suggested that we were logging too many errors, and the actual failure may be earlier in the process. Armed with this knowledge, we looked at the very beginning of the message chain.
#14: And this is what we saw.
If you work with NUMA machines, you may immediately see “shrink_node” and have a minor PTSD episode.
What you should be looking at is the number of missed kernel messages. There were so many errors, journald wasn’t able to keep up. We have console access to work around that, and that’s where we saw page allocation stalls in the second log except.
You don't want your page allocations to stall for 5 minutes, especially when it's order zero allocation, which is the smallest allocation of one 4 KiB page.
#15: Comparing to our control nodes, the only two possible explanations were: a minor kernel upgrade, and the switch from Debian Jessie to Debian Stretch. We suspected the former, since CPU usage implies a kernel issue.
Just to be safe, we rolled both the kernel back from 4.9 to a known good 4.4, and downgraded the affected nodes back to Debian Jessie. This was a reasonable compromise, since we needed to minimize downtime on production nodes.
Then we proceeded to look into the issue in isolation.
To our surprise, after some bisecting we found that OS upgrade alone was responsible for our issues, kernel was off the hook.
Now all that remained is to find out what exactly was going on.
#16: Flamegraphs are a great way to visualize stacks that cause CPU usage in the system.
We have a wrapper around Brendan Gregg’s flamegraph scripts that removes idle time and enables JVM stacks out of the box.
This gives us a way to get an overview of CPU usage in one command.
#17: And this is how full system flamegraphs look like. We have jessie in the background on the left and stretch in the foreground on the right.
This may be hard to see, but the idea is that each bar is a stack frame and width corresponds to frequency of this stack’s appearance, which is a proxy for CPU usage.
You can see a fat column of frames on the left on Stretch, that’s not present on Jessie. We can see it’s the sendfile syscall and it’s highlighted in purple. It’s also present and highlighted on Jessie, but it’s tiny and quite hard to see.
Flamegraphs allow you to click on the frame, which will zoom into stacks containing this frame, generating some sort of a sub-flamegraph.
#18: So let’s click on sendfile on Stretch and see what’s going on.
#19: This is what we saw. For somebody who’s not a kernel developer this just looks like a bunch of TCP stuff, which is exactly what I saw.
Some colleagues suggested that the differences in the graphs may be due to TCP offload being disabled, but upon checking our NIC settings, we found that the feature flags were identical.
You can also see some spinlocks at the tip of the flamegraph, which reinforces our initial findings with perf top.
Let’s see what else we can figure out from here.
#20: To find out what’s going on with the system, we’ll be using bcc tools. Linux kernel has a VM that allows us to attach lightweight and safe probes to trace the kernel. eBPF itself is a hot topic and there are talks that explore it in great detail, slides for this talk link to them if you are interested.
To clarify, VM here is more like JVM that provides runtime and not like KVM that provides hardware virtualization. You can compile code down to this VM from any language, so don’t look surprised when one day you’ll see javascript running in the kernel. I warned you.
For the sake of brevity let’s just say that there’s a collection of readily available utilities that can help you debug various parts of the kernel and underlying hardware. That collection is called BCC tools and we’re going to use some of these to get to the bottom of our issue.
On this slide you can see how different subsystems can be traced with different tools.
#21: To trace latency distributions of sendfile syscalls between Jessie and Stretch, we’re going to use funclatency. It takes a function name and prints exponential latency histogram for the function calls. Here we print latency histogram for do_sendfile, which is sendfile syscall function, in microseconds, every second.
You can see that most of the calls on Jessie hover between 8 and 31 microseconds. Is that good or bad? I don’t know, but a good way to find out is to compare against another system.
#22: Now let’s look at what’s going on with Stretch. I had to cut some parts, because histogram was not fitting into the slide.
If on Jessie we saw most of the calls complete in under 31 microsecond, here we see that that number is 511 microseconds, that’s a whopping 16x jump in latency.
#23: In the flamegraphs, you can see timers being set at the tip (mod_timer function is responsible for that), with these timers taking locks.
We can count number of function calls instead of measuring their latency, and this is where funccount tool comes in. Feeding mod_timer as an argument to it we can see how many function calls there were every second.
Here we have Jessie on the left and Stretch on the right. On stretch we installed 3x more timers than on Jessie. That’s not 16x difference, but still something.
#24: If we look at the number of locks taken for these timers by running funccount on lock_timer_base function, we can see an even bigger difference, around 10x this time.
To sum up: on Stretch we installed 3x more timers, resulting in 10x the amount of contention. It definitely seems like we’re onto something.
#25: We can look at the kernel source code to figure out which timers are being scheduled based on the flamegraph, but that seems like a tedious task. Instead, we can use perf tool again to gather some stats on this for us.
There’s a bunch of tracepoints in the kernel that provide insight into timer subsystem. We’re going to use timer_start for our needs.
#26: Here we record all timers started for 10s and then print function names they were triggering with respective counts.
On Stretch we install 12x tcp_write_timer timers, that sounds like something that can cause issues. Remember: we are on a bandwidth bound workload where interface is 20G, that’s a lot of bytes to move.
#27: Taking specific flamegraphs of the timers revealed the differences in their operation.
It’s probably hard ro see, but tcp_push_one really stands out on Stretch.
Let’s dig in.
#28: The traces showed huge variations of tcp_sendmsg and tcp_push_one within sendfile, which is expected from the flamegraphs before.
#29: To further introspect, we leveraged a kernel feature available since 4.9: an ability to count and aggregate stacks in the kernel. BCC tools include stackcout tool that does exactly that, so let’s take advantage of that.
#30: The most popular Jessie stack is on the left and the most popular Stretch stack is on the right. There were a few much less popular stacks too, but there’s only so much one can fit on the slides.
Stretch stack was too long, “…” is the same as highlighted section in Jessie stack.
These are mostly the same and it’s not exactly fun to find the difference, so let’s just look at the diff on the next slide.
#31: We see 5 extra functions in the middle of the stack, starting with tcp_sendpage. Time to look at the source code.
Usually I just google the function name and it gives me a result to elixir.bootlin.com, where I swap “latest” to my kernel version. Source code there allows you to click on identifiers and jump around the code to navigate.
#32: This is how tcp_sendpage function looks like, I pasted it verbatim from the kernel source.
From tcp_sendpage our stack jumps into sock_no_sendpage. If you lookup what NET_F_SG means, you’ll find it’s segmentation offload.
Segmentation offload is a technique where kernel doesn’t split TCP stream into packets, but instead offloads this job to a NIC. This makes a big difference when you want to send large chunks of data over high speed links. That’s exactly what we are doing and we definitely want to have offload enabled.
#33: Let’s take a pause and see how we configure network on our machines. Our 2x10G NIC provides eth2 and eth3, which we then bond into bond0 interface. On top of that bond0 we create two vlan interfaces, one for public internet and one for internal network.
#34: It turned out that we had segmentation offload enabled for only a few of our NICs: eth2, eth3, and bond0. When we checked NIC settings for offload earlier, we only checked physical interfaces and bonded one, but ignored vlan interfaces, where offload was indeed missing.
#35: We compared ethtool output for vlan interface and there was our issue in plain sight.
#36: We can just enable TCP offload by enabling scatter-gather (which is what “sg” stands for) and be done with it. Easy, right?
Imagine our disappointment when this did not work. So much work with clear indication that this is the cause and the fix did not work.
#37: The last missing piece we found was that offload changes are applied only during connection initiation. We turned Kafka off and back on again to start offloading and immediately saw positive effects, which is green line.
This is not 5x change I mentioned at the beginning, because we were experimenting on a lightly loaded node to avoid disruptions.
#38: Our network interfaces are managed by systemd networkd, so it turns out that missing offload settings were a bug in systemd in the end. It’s not clear whether upstream or Debian patches are responsible for this, however.
In the meantime, we work around our upstream issue by enabling offload features automatically on boot if they are disabled on VLAN interfaces.
#39: Having a fix enabled, we rebooted our logs Kafka cluster to upgrade to the latest kernel, and on 5 day CPU usage history you can see clear positive results.
#40: On DNS cluster results were more dramatic because of the higher load. On this screenshot only one node is fixed, but you can see how much better it behaves compared to the rest.
#41: The first lesson here is to pay closer attention to metrics during major upgrades. We did not see major CPU changes on moderately loaded cluster and did not expect to see any effects on fully loaded machines. In the end we were not upgrading Kafka, which was main consumer of user CPU, or kernel, which was consuming system CPU.
The second lesson is how useful perf and bcc tools were at pointing us to where the issue is. These tools work out of the box, they are safe and do not require any third party kernel modules. More importantly, they do not require operator to be a kernel expert, you just need some basic understanding of concepts.
Another lesson is how important TCP offload is and how its importance grows non-linearly with traffic. It was unexpected that supposedly purely virtual vlan interfaces could be affected by offload, but it turned out they were. Challenge your assumptions often, I guess.
Lastly, we used our ability to swap OS and kernels on reboot to the fullest. Having no need to install OS meant we didn’t have to reinstall it and could iterate quickly.
#42: Internal blog post about this incident was published in August 2017, heavily truncated external blog post went out in May 2018. That external blog post is what this talk is based on.
All of it to illustrate how the tool we wrote can be used. If during debugging we were using bcc tools to count timers firing in the kernel ad hoc, we could’ve had a metric for this and noticed the issue sooner by just seeing an increase on a graph. This is what ebpf_exporter allows you to have: you can trace any function in the kernel (and in userspace) at very low overhead and create metrics in Prometheus format from it.
For example, you can have latency histogram for disk io as a metric, which is not normally possible with procfs or anything else.
#43: Here’s a slide from my presentation of ebpf_exporter, which shows the level of detail you can get. On the left you can see IO wait time from /proc/diskstats, which is what Linux provides, and on the right you can see heatmap of IO latency, which is what ebpf_exporter enables.
With the histograms you can see how many IOs landed in a particular bucket and things like multimodal distributions can be seen. You can also see how many IOs went above some threshold, allowing you to have alerts on this.
Same goes for timers, kernel does not keep count of what is firing anywhere for collection.
#44: That’s all I had to talk about today. On the slides you have some links on the topic. Slides with speaker notes will be available on the LISA18 website and I’ll also tweet the link.
I encourage you to look at my talk on ebpf_exporter itself, which goes into details about why histograms are so great. It involves dinosaur gifs in a very scientific way you probably do not expect, so make sure to check that out.
My colleague Alex will be doing a training on ebpf_exporter tomorrow if you want to learn more about that, please come and talk to us. Slides have the information on time and location.
If you want to learn more about eBPF itself, you can find Brendan Gregg around and ask him as well as myself.