Papers by Talita Perciano
Data of the work 'Rotational Dynamics and Transition Mechanisms of Surface-Adsorbed Proteins' at PNAS
Zenodo (CERN European Organization for Nuclear Research), Mar 23, 2022

Journal of Plasma Physics, Aug 1, 2022
Three machine learning techniques (multilayer perceptron, random forest and Gaussian process) pro... more Three machine learning techniques (multilayer perceptron, random forest and Gaussian process) provide fast surrogate models for lower hybrid current drive (LHCD) simulations. A single GENRAY/CQL3D simulation without radial diffusion of fast electrons requires several minutes of wall-clock time to complete, which is acceptable for many purposes, but too slow for integrated modelling and real-time control applications. The machine learning models use a database of more than 16 000 GENRAY/CQL3D simulations for training, validation and testing. Latin hypercube sampling methods ensure that the database covers the range of nine input parameters (n e0 , T e0 , I p , B t , R 0 , n , Z eff , V loop and P LHCD) with sufficient density in all regions of parameter space. The surrogate models reduce the inference time from minutes to ∼ms with high accuracy across the input parameter space.
We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm... more We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).
arXiv (Cornell University), Oct 25, 2023
The processing and analysis of computed tomography (CT) imaging is important for both basic scien... more The processing and analysis of computed tomography (CT) imaging is important for both basic scientific development and clinical applications. In AutoCT, we provide a comprehensive pipeline that integrates an end-to-end automatic preprocessing, registration, segmentation, and quantitative analysis of 3D CT scans. The engineered pipeline enables atlas-based CT segmentation and quantification leveraging diffeomorphic transformations through efficient forward and inverse mappings. The extracted localized features from the deformation field allow for downstream statistical learning that may facilitate medical diagnostics. On a lightweight and portable software platform, AutoCT provides a new toolkit for the CT imaging community to underpin the deployment of artificial intelligence-driven applications.
Quantum Computing and Visualization: A Disruptive Technological Change Ahead
IEEE Computer Graphics and Applications
How to combine TerraSAR-X and CosmoSkyMed Images for a better scene understanding ?
HAL (Le Centre pour la Communication Scientifique Directe), Jul 1, 2012

Building on a significant amount of current research that examines the idea of platform-portable ... more Building on a significant amount of current research that examines the idea of platform-portable parallel code across different types of processor families, this work focuses on two sets of related questions. First, using a performance analysis methodology that leverages multiple metrics including hardware performance counters and elapsed time on both CPU and GPU platforms, we examine the performance differences that arise when using two common platform portable parallel programming approaches, namely OpenMP and VTK-m, for a stencil-based computation, which serves as a proxy for many different types of computations in visualization and analytics. Second, we explore the performance differences that result when using coarserand finer-grained parallelism approaches that are afforded by both OpenMP and VTK-m. CCS Concepts • Computing methodologies ! Parallel programming languages; • Theory of computation ! Shared memory algorithms;

arXiv (Cornell University), Oct 5, 2020
Measurements of absolute runtime are useful as a summary of performance when studying visualizati... more Measurements of absolute runtime are useful as a summary of performance when studying visualization and analysis methods on computational platforms of increasing concurrency and complexity. We can obtain even more insights by measuring and examining more detailed measures from hardware performance counters, such as the number of instructions executed by an algorithm implemented in a particular way, the amount of data moved to/from memory, memory hierarchy utilization levels via cache hit/miss ratios, and so forth. This work focuses on performance analysis on modern multi-core platforms of three different visualization and analysis kernels that are implemented in different ways: one is "traditional", using combinations of C++ and VTK, and the other uses a data-parallel approach using VTK-m. Our performance study consists of measurement and reporting of several different hardware performance counters on two different multi-core CPU platforms. The results reveal interesting performance differences between these two different approaches for implementing these kernels, results that would not be apparent using runtime as the only metric.

Fast detection of material deformation through structural dissimilarity
Designing materials that are resistant to extreme temperatures and brittleness relies on assessin... more Designing materials that are resistant to extreme temperatures and brittleness relies on assessing structural dynamics of samples. Algorithms are critically important to characterize material deformation under stress conditions. Here, we report on our design of coarse-grain parallel algorithms for image quality assessment based on structural information and on crack detection of gigabyte-scale experimental datasets. We show how key steps can be decomposed into distinct processing flows, one based on structural similarity (SSIM) quality measure, and another on spectral content. These algorithms act upon image blocks that fit into memory, and can execute independently. We discuss the scientific relevance of the problem, key developments, and decomposition of complementary tasks into separate executions. We show how to apply SSIM to detect material degradation, and illustrate how this metric can be allied to spectral analysis for structure probing, while using tiled multi-resolution pyramids stored in HDF5 chunked multi-dimensional arrays. Results show that the proposed experimental data representation supports an average compression rate of 10X, and data compression scales linearly with the data size. We also illustrate how to correlate SSIM to crack formation, and how to use our numerical schemes to enable fast detection of deformation from 3D datasets evolving in time.

Lecture Notes in Computer Science, 2020
Tomographic imaging has benefited from advances in X-ray sources, detectors and optics to enable ... more Tomographic imaging has benefited from advances in X-ray sources, detectors and optics to enable novel observations in science, engineering and medicine. These advances have come with a dramatic increase of input data in the form of faster frame rates, larger fields of view or higher resolution, so high performance solutions are currently widely used for analysis. Tomographic instruments can vary significantly from one to another, including the hardware employed for reconstruction: from single CPU workstations to large scale hybrid CPU/GPU supercomputers. Flexibility on the software interfaces and reconstruction engines are also highly valued to allow for easy development and prototyping. This paper presents a novel software framework for tomographic analysis that tackles all aforementioned requirements. The proposed solution capitalizes on the increased performance of sparse matrixvector multiplication and exploits multi-CPU and GPU reconstruction over MPI. The solution is implemented in Python and relies on CuPy for fast GPU operators and CUDA kernel integration, and on SciPy for CPU sparse matrix computation. As opposed to previous tomography solutions that are tailor-made for specific use cases or hardware, the proposed software is designed to provide flexible, portable and highperformance operators that can be used for continuous integration at different production environments, but also for prototyping new experimental settings or for algorithmic development. The experimental results demonstrate how our implementation can even outperform state-of-theart software packages used at advanced X-ray sources worldwide.
Quantum Image Pixel Library (QPIXL++) v0.1.0
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), Sep 17, 2021
QPIXL++ is a software package to compile quantum circuits for compressed representation of images... more QPIXL++ is a software package to compile quantum circuits for compressed representation of images on quantum hardware.

Scientific Reports, May 11, 2022
We introduce a novel and uniform framework for quantum pixel representations that overarches many... more We introduce a novel and uniform framework for quantum pixel representations that overarches many of the most popular representations proposed in the recent literature, such as (I)FRQI, (I)NEQR, MCRQI, and (I)NCQI. The proposed QPIXL framework results in more efficient circuit implementations and significantly reduces the gate complexity for all considered quantum pixel representations. Our method scales linearly in the number of pixels and does not use ancilla qubits. Furthermore, the circuits only consist of R y gates and CNOT gates making them practical in the NISQ era. Additionally, we propose a circuit and image compression algorithm that is shown to be highly effective, being able to reduce the necessary gates to prepare an FRQI state for example scientific images by up to 90% without sacrificing image quality. Our algorithms are made publicly available as part of QPIXL++, a Quantum Image Pixel Library.

Scientific Reports, Dec 19, 2022
The legend of Figure 5: "256 × 256 image data of a ceramic matrix composite sample 49 acquired us... more The legend of Figure 5: "256 × 256 image data of a ceramic matrix composite sample 49 acquired using microCT simulated with QPIXL++ at various compression levels and corresponding gate counts of the 17-qubit U R circuit. The final two rows list the reduction in R y and CNOT gates compared to the uncompressed circuits. " now reads: "28 × 28 image data from the MNIST 47,48 database simulated with QPIXL++ at various compression levels and corresponding gate counts of the 11-qubit U R circuit. The final two rows list the reduction in R y and CNOT gates compared to the uncompressed circuits. " The legend of Figure 6: "28 × 28 image data from the MNIST 47,48 database simulated with QPIXL++ at various compression levels and corresponding gate counts of the 11-qubit U R circuit. The final two rows list the reduction in R y and CNOT gates compared to the uncompressed circuits. " now reads: "256 × 256 image data of a ceramic matrix composite sample 49 acquired using microCT simulated with QPIXL++ at various compression levels and corresponding gate counts of the 17-qubit U R circuit. The final two rows list the reduction in R y and CNOT gates compared to the uncompressed circuits. " The original Article has been corrected.

Springer eBooks, 2020
This work examines performance characteristics of multiple shared-memory implementations of a pro... more This work examines performance characteristics of multiple shared-memory implementations of a probabilistic graphical modeling (PGM) optimization code, which forms the basis for an advanced, stateof-the art image segmentation method. The work is motivated by the need to accelerate scientific image analysis pipelines in use by experimental science, such as at x-ray light sources, and is motivated by the need for platform-portable codes that perform well across many different computational architectures. The primary focus of this work and its main contribution is an in-depth study of shared-memory parallel performance of different implementations, which include those using alternative parallelization approaches such as C11-threads, OpenMP, and data parallel primitives (DPPs). Our results show that, for this complex dataintensive algorithm, the DPP implementation exhibits better runtime performance, but also exhibits less favorable scaling characteristics than the C11-threads and OpenMP counterparts. Based upon a set of experiments that collect hardware performance counters on multiple platforms, the reason for the runtime performance difference appears to be due primarily to algorithmic efficiency gains: the reformulation from the traditional C11-threads and OpenMP expression of the solution into that of data parallel primitives results in significantly fewer instructions being executed. This study is the first of its type to do performance analysis using hardware counters for comparing methods based on VTK-m-based data-parallel primitives with those based on more traditional OpenMP or threads-based parallelism. It is timely, as there is increasing awareness of the need for platform portability in light of increasing node-level parallelism and increasing device heterogeneity.

Rotational dynamics and transition mechanisms of surface-adsorbed proteins
Proceedings of the National Academy of Sciences of the United States of America, Apr 11, 2022
Significance The exquisite organization exhibited by hybrid biomolecular–inorganic materials in n... more Significance The exquisite organization exhibited by hybrid biomolecular–inorganic materials in nature has inspired the development of synthetic analogues for numerous applications. Nevertheless, a mechanistic picture of the energetic controls and response dynamics leading to organization is lacking. Here, we pair high-speed atomic force microscopy with machine learning and Monte Carlo simulations to analyze the rotational dynamics of rod-like proteins on a crystal lattice, simultaneously quantifying the orientational energy landscape and transition probabilities between energetically favorable orientations. Although rotations largely follow Brownian diffusion, proteins often make large jumps in orientation, thus rapidly overcoming barriers that usually inhibit rotation. Moreover, the rotational dynamics can be tuned via protein size and solution chemistry, providing tools for controlling biomolecular assembly at inorganic interfaces.

Nature Communications, Mar 2, 2018
Battery function is determined by the efficiency and reversibility of the electrochemical phase t... more Battery function is determined by the efficiency and reversibility of the electrochemical phase transformations at solid electrodes. The microscopic tools available to study the chemical states of matter with the required spatial resolution and chemical specificity are intrinsically limited when studying complex architectures by their reliance on two-dimensional projections of thick material. Here, we report the development of soft X-ray ptychographic tomography, which resolves chemical states in three dimensions at 11 nm spatial resolution. We study an ensemble of nano-plates of lithium iron phosphate extracted from a battery electrode at 50% state of charge. Using a set of nanoscale tomograms, we quantify the electrochemical state and resolve phase boundaries throughout the volume of individual nanoparticles. These observations reveal multiple reaction points, intra-particle heterogeneity, and size effects that highlight the importance of multi-dimensional analytical tools in providing novel insight to the design of the next generation of high-performance devices.

arXiv (Cornell University), Jan 18, 2023
Compact quantum data representations are essential to the emerging field of quantum algorithms fo... more Compact quantum data representations are essential to the emerging field of quantum algorithms for data analysis. We introduce two new data encoding schemes, QCrank and QBArt, which have a high degree of quantum parallelism through uniformly controlled rotation gates. QCrank encodes a sequence of real-valued data as rotations of the data qubits, allowing for high storage density. QBArt directly embeds a binary representation of the data in the computational basis, requiring fewer quantum measurements and lending itself to well-understood arithmetic operations on binary data. We present several applications of the proposed encodings for different types of data. We demonstrate quantum algorithms for DNA pattern matching, Hamming weight calculation, complex value conjugation, and retrieving an O(400) bits image, all executed on the Quantinuum QPU. Finally, we use various cloud-accessible QPUs, including IBMQ and IonQ, to perform additional benchmarking experiments.
arXiv (Cornell University), Sep 13, 2018
We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm... more We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).

bioRxiv (Cold Spring Harbor Laboratory), Apr 9, 2020
We describe how to use several machine learning techniques organized in a learning pipeline to se... more We describe how to use several machine learning techniques organized in a learning pipeline to segment and identify subcellular structures from cryo electron tomograms. These tomograms are difficult to analyze with traditional segmentation tools. The learning pipeline in our approach starts from supervised learning via a special convolutional neural network trained with simulated data. It continues with semi-supervised reinforcement learning and/or a region merging techniques that try to piece together disconnected components that should belong to the same subcellular structure. A parametric or non-parametric fitting procedure is then used to enhance the segmentation results and quantify uncertainties in the fitting. Domain knowledge is used in generating the training data for the neural network and in guiding the fitting procedure through the use of appropriately chosen priors and constraints. We demonstrate that the approach proposed here work well for extracting membrane surfaces of protein reconstituted liposomes in a cellular environment that contains other artifacts.

Machine learning for micro-tomography
Machine learning has revolutionized a number of fields, but many micro-tomography users have neve... more Machine learning has revolutionized a number of fields, but many micro-tomography users have never used it for their work. The micro-tomography beamline at the Advanced Light Source (ALS), in collaboration with the Center for Applied Mathematics for Energy Research Applications (CAMERA) at Lawrence Berkeley National Laboratory, has now deployed a series of tools to automate data processing for ALS users using machine learning. This includes new reconstruction algorithms, feature extraction tools, and image classification and recommen- dation systems for scientific image. Some of these tools are either in automated pipelines that operate on data as it is collected or as stand-alone software. Others are deployed on computing resources at Berkeley Lab–from workstations to supercomputers–and made accessible to users through either scripting or easy-to-use graphical interfaces. This paper presents a progress report on this work.
Uploads
Papers by Talita Perciano