0% found this document useful (0 votes)
264 views42 pages

CP2K: Automation, Scripting, Testing

CP2K can be automated, scripted, and tested in several ways. It can generate full inputs using GUIs or scripting. Tests verify the current version and multiple platforms are supported. Input debugging and output capturing helps with reproducibility. Data like potential, basis sets, structures, trajectories, energies, and more are archived.

Uploaded by

Dr. Dinesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
264 views42 pages

CP2K: Automation, Scripting, Testing

CP2K can be automated, scripted, and tested in several ways. It can generate full inputs using GUIs or scripting. Tests verify the current version and multiple platforms are supported. Input debugging and output capturing helps with reproducibility. Data like potential, basis sets, structures, trajectories, energies, and more are archived.

Uploaded by

Dr. Dinesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

CP2K: Automation, Scripting, Testing

Tiziano Müller
tiziano.mueller@chem.uzh.ch
CP2K Workshop @ UGent, 11.-13. March 2019
Dept. of Chemistry, UZH
Outline

Full Input generation: GUIs


Preparations
Full Input generation: Scripting
Installation
CP2K Preprocessor
Verification
“Run” Automation
Reproducibility
Batch Processing
Syntax-Checking & Input-Debugging
Workflows
Archival
Integration: Phonopy, PyRETIS, i-PI
Input Generation
Basis Set Verification
Structure-only: Supported formats Performance Optimisation

1
Preparations
Building CP2K with the toolchain scripts

$ git clone --recursive https://github.com/cp2k/cp2k.git


$ cd cp2k/tools/toolchain • Default: Uses
$ ./install_cp2k_toolchain.sh
MPI is detected and it appears to be OpenMPI system compiler,
nvcc not found, disabling CUDA by default
Compiling with 8 processes. linker and MPI
==================== Finding binutils from system paths ====================
[...] • Builds and
==================== generating arch files ====================
arch files can be found in the /data/cp2k/tools/toolchain/install/arch subdirectory configures for:
Wrote /data/cp2k/tools/toolchain/install/arch/local.sopt
Wrote /data/cp2k/tools/toolchain/install/arch/local.sdbg libxc, libint,
Wrote /data/cp2k/tools/toolchain/install/arch/local.ssmp
[...] libxsmm, ELPA,
========================== usage =========================
Done! SIRIUS
Now copy:
cp /data/cp2k/tools/toolchain/install/arch/* to the cp2k/arch/ directory • Support for
To use the installed tools and libraries and cp2k version
compiled with it you will first need to execute at the prompt: Linux & macOS
source /data/cp2k/tools/toolchain/install/setup
To build CP2K you should change directory:
cd cp2k/
make -j 8 ARCH=local VERSION="sopt sdbg ssmp popt pdbg psmp" 2
Building CP2K with the toolchain scripts: Configuration

• Build everything:
$ ./install_cp2k_toolchain.sh --install-all

• More options:
$ ./install_cp2k_toolchain.sh --help

→ Fortran requires .mod files and code built with same compiler!
• Manual clean (build/, install/) recommended after re-configuration
• Check https://www.cp2k.org/dev:compiler_support for supported
compilers and libraries

3
Building CP2K with Spack

$ git clone https://github.com/spack/spack.git


$ . ./share/spack/setup-env.sh
$ spack install cp2k Spack
$ spack load cp2k

• Package manager for scientific software


• Requires Python
• Automatically detects and reuses available compiler
• Recursively builds all CP2K prerequisites
• Installs CP2K and the arch-file used to build it

4
Building CP2K with Spack: Configuration
$ spack info cp2k
MakefilePackage: cp2k

Description:
CP2K is a quantum chemistry and solid state physics software package
that can perform atomistic simulations of solid state, liquid,
molecular, periodic, material, crystal, and biological systems

Homepage: https://www.cp2k.org

Tags:
None

Preferred version:
6.1 https://github.com/cp2k/cp2k/releases/download/v6.1.0/cp2k-6.1.tar.bz2

Safe versions:
6.1 https://github.com/cp2k/cp2k/releases/download/v6.1.0/cp2k-6.1.tar.bz2
[...]

Variants:
Name [Default] Allowed values Description

blas [openblas] openblas, mkl, Enable the use of


accelerate OpenBlas/MKL/Accelerate 5
elpa [off] True, False Enable optimised
Building CP2K with Spack: Locating the arch-file

$ spack find -p cp2k


==> 1 installed package
-- linux-opensuse_leap15-x86_64 / gcc@7.3.1 ---------------------
cp2k@6.1 [...]/linux-opensuse_leap15-x86_64/gcc-7.3.1/cp2k-6.1-byjtwnyhrqqmzezvpy3zwiccccmexshd

$ ls [...]/cp2k-6.1-byjtwnyhrqqmzezvpy3zwiccccmexshd/.spack/archived-files/arch/
linux-opensuse_leap15-x86_64-gcc.popt

→ Use Spack arch-file for custom build of CP2K with Spack-installed libraries
- By default Spack builds all dependencies except compiler & linker.
Extra configuration needed to use system-MPI.
Alternative:

6
Testing CP2K

State of latest version Verify your build

$ make VERSION=sopt ARCH=local test


CP2K supports: cp2kflags: libint fftw3 libxc libderiv_max_am1=5 libint_
Skipping QS/regtest-cdft-hirshfeld-2 : missing required feature : parall
Skipping QS/regtest-cdft-hirshfeld-2 : missing required feature : mpiran
[...]
--------------------------------- Summary ------------------------------
Number of FAILED tests 0
Number of WRONG tests 0
Number of CORRECT tests 3031
Number of NEW tests 0
Total number of tests 3031
GREPME 0 0 3031 0 3031 X
Summary: correct: 3031 / 3031; 6min
Status: OK
------------------------------------------------------------------------
https://dashboard.cp2k.org Regtest took 379.00 seconds.
------------------------------------------------------------------------
Thu Feb 28 15:33:59 CET 2019
Multiple platforms/architectures *************************** testing ended ******************************
available, including full logs and
their arch-files. → Automatically skips unavailable features
7
Reproducibility
Input-Debugging & Output-Capturing

$ cp2k -c your.inp
*******************************************************************************

• Basic issues are found *


* /
___
*
*

* [ABORT] *
• Complex tests only at full runtime * ___/ found an unknown keyword APACHE in section __ROOT__ *
* | *

→ Use low cutoffs, limit SCF cycles to * O/|


* /| |
*
*
* / input/input_parsing.F:246 *
get a full check *******************************************************************************

(MAX_SCF, MAX_STEPS, …)

Output capturing:
$ cp2k your.inp |& tee your.out
$ cp2k your.inp -o your.out (production run)
→ Leave error output handling to batch-system if possible
8
Data Archival

$ cp2k -e your.inp
• Full-input: includes current default settings & resolved preprocessor variables
• Can also be used for debugging complex inputs and parsing errors
• Other artefacts:

• POTENTIAL • proj-1.restart (a full input file)


• BASIS_SET • proj-pos-1.xyz (MD/GEO_OPT trajectories)
• Structure files: • proj-1.ener (MD energies, temperature, …)
.xyz, .pdb, … • proj-1.cell (cell parameters for CELL_OPT,
• Force field, dispersion NPT MD)
correction parameter, • proj-RESTART.wfn, proj-RESTART.kp
DFTB files (orbitals for restart)
9
Input Generation
Structure-only: Supported formats

Use your favorite molecular structure editor.

Supported formats:1
XYZ (coords only), PDB, CIF,
G96/G87 (GROMACS), PSF/UPSF (CHARMM), CRD (AMBER), XTL

1
*.restart files have coordinates integrated as &COORD section

10
Full Input generation: Avogadro

Avogadro
• CP2K Plugin for Avogadro 1.x:
https://github.com/brhr-iwao/
libavogadro1cp2k

• CP2K Plugin for Avogadro 2.x:


https://github.com/svedruziclab/
avogadrolibs-cp2k

11
Full Input generation: Chimera

Chimera
• Menu-driven + visualisation
• TETR: pre-processing
• geometry setup
• supercell, surfaces,
clusters calculation
• LEV00: analysis
• Visualising charge &
spin densities
• DOS, Phonons, IR
spectra
TETR+LEV00 Plugin for Chimera
12
Full Input generation: PYCP2K

from pycp2k import CP2K

• Domain Specific Language (DSL) with from ase.lattice.cubic import Diamond

Python #====================== Create the structure with ASE ==========================


lattice = Diamond(directions=[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
symbol='Si',
PYCP2K
• Keywords match CP2K input file latticeconstant=5.430697500,
size=(1, 1, 1))

syntax #================= Define and setup the calculator object ======================


calc = CP2K()

• Integration with Python ASE calc.working_directory = "./"


calc.project_name = "si_bulk"

• Auto-completion based on Python


calc.mpi_n_processes = 2

#================= An existing input file can be parsed =======================

auto-completion engines calc.parse("template.in")

#==================== Define shortcuts for easy access =========================


CP2K_INPUT = calc.CP2K_INPUT
GLOBAL = CP2K_INPUT.GLOBAL
FORCE_EVAL = CP2K_INPUT.FORCE_EVAL_add() # Repeatable items have to be first created
SUBSYS = FORCE_EVAL.SUBSYS
DFT = FORCE_EVAL.DFT
SCF = DFT.SCF

#======================= Write the simulation input ============================


GLOBAL.Run_type = "ENERGY_FORCE" 13
FORCE_EVAL.Method = "Quickstep"
Full Input generation: Python ASE

• Powerful structure building tools import numpy as np

• Merging with pre-existing input file from ase.build import bulk


from ase.constraints import UnitCellFilter The Atomic Simulation
structure possible (templating)
from ase.optimize import MDMin
Environment
from ase.calculators.cp2k import CP2K

• Uses cp2k_shell to run CP2K inp = """&FORCE_EVAL


&MM

continuously →minimal overhead


&FORCEFIELD
&SPLINE
EMAX_ACCURACY 500.0

• Can start CP2K on remote machine EMAX_SPLINE


EPS_SPLINE 1.0E-9
1000.0

&END
&NONBONDED
&LENNARD-JONES
atoms Ar Ar
EPSILON [eV] 1.0
SIGMA [angstrom] 1.0
RCUT [angstrom] 10.0
&END LENNARD-JONES
[...]
&END FORCE_EVAL"""

calc = CP2K(label="test_stress", inp=inp, force_eval_method="Fist")


14
# Theoretical infinite-cutoff LJ FCC unit cell parameters
CP2K Internal Input Preprocessor

CP2K input may include extra directives which are evaluated before everything else:
@INCLUDE 'filename.inc'
The content of the specified file are included at this point. The path is
assumed to be relative to the current working directory.
@SET VAR value
(re-)define a variable
${VAR} or $VAR
Replaced by the content of the previously set variable VAR
@IF …/@ENDIF
Conditionals. Supported operators: == and /= (lexical comparison). The value
0 or whitespaces evaluate to FALSE, everything else to TRUE.
@PRINT …
15
Print the given text while pre-processing
CP2K Internal Input Preprocessor: Example

 HOME
 HighThroughputProject
 base.inp...........................@INCLUDE 'settings.inp'
 structure1
 settings.inp
 structure.xyz
 structure2.............................start CP2K in this directory
 settings.inp.........................inclusion relative to CWD
 structure.xyz
 structure3
→ settings.inp can contain @SET other @INCLUDE or full sections/keywords

16
“Run” Automation
“Run” automation

Shell
Scripts
Batch
Workflows
Processing
CP2K
Farming
Python
Scripts
Scheduler AiiDA atomate
assisted
16
Batch Processing: CP2K Farming

&GLOBAL
PROJECT OldMacDonald
PROGRAM FARMING set to NONE
RUN_TYPE NONE
&END GLOBAL
&FARMING
use FARMING section
NGROUPS 2 ! number of parallel jobs
MASTER_SLAVE ! for load balancing • Jobs are run inside the same
GROUP_SIZE 42 ! number of processors per group, default: 8
&JOB CP2K process
JOB_ID 1 ! optional, required for dependencies
DIRECTORY dir-1
INPUT_FILE_NAME water.inp
• MPI gets initialized once
OUTPUT_FILE_NAME water.out insert original → reduced startup time
&END JOB
&JOB input here • Useful for many small jobs
DEPENDENCIES 1
DIRECTORY dir-2
INPUT_FILE_NAME water.inp
OUTPUT_FILE_NAME more_water.out
&END JOB
[...]
&END FARMING

17
Workflows: AiiDA

• Python-based Automated
Interactive Infrastructure
and Database
• Strong focus on Data Provenance
• Database backend (PostgreSQL) + File Repository
• Advanced workflow engine on top of Python
• Plugin architecture:
• CP2K Plugin
• Gaussian Basis Set and Pseudopotential Plugin
• more in the AiiDA Plugin Registry
• Jupyter Notebook integration
• Integration with the MaterialsCloud Open Science platform

18
Workflows: AiiDA Example
calc = Code.get_from_string("cp2k").new_calc()
• Runs on your machine
calc.label = "Awesome CP2K calculation"
• Manages job submission and
atoms = ase.build.molecule('H2O') # build structure
atoms.center(vacuum=2.0) retrieval (incl. scheduler support)
structure = StructureData(ase=atoms)
calc.use_structure(structure) # ... or reuse existing • Tracks structures & calculations
parameters = ParameterData(dict={
'FORCE_EVAL': {
'METHOD': 'Quickstep',
'DFT': {
'QS': {
'EPS_DEFAULT': 1.0e-12,
},
[...]
}
})
calc.use_parameters(parameters)

calc.set_max_wallclock_seconds(3*60)
calc.set_resources({"num_machines": 4})
calc.set_computer(Computer.get("skitty"))

calc.store_all() # store in database AiiDA Data Provenance Graph 19


calc.submit() # submit for calculation
Integration: Phonopy, PyRETIS, i-PI
Phonopy: Phonon calculation with CP2K

CP2K Input

• Python based phonopy (pre) phonopy


• File-based interface:
parses & generates code inputs CP2K Input w/ atomic displacements

• Only needs equilibrated and


symmetrized crystal structure input
• Uses a supercell approach Energies & Forces

• Can also generate:


DOS, pDOS, Thermal properties phonopy (post)

Phonon band structure

20
PyRETIS: Transition Interface Sampling

CP2K + PyRETIS Input

• Python based PyRETIS – rare events in Python

• Transition Interface
Sampling (TIS) and Replica
Exchange Transition PyRETIS

Interface Sampling (RETIS)


• Can use CP2K as integrator
for MD steps
Crossing probability/rate constant

21
i-PI: a universal force engine

• Python based i-PI configuration


i-PI – a universal force engine

• Focus on Path Integral


Molecular Dynamics
Sockets
• Communication with Force i-PI

Engines via network sockets


• Many additional methods
available MD Trajectory

22
Basis Set Verification
Basis Set Verification: The challenge

“For GTOs, a triple-ζ quality basis has


mean errors of ~10 kcal/mol in total
energies, while chemical accuracy is
almost reached for a quintuple-ζ basis...”

→Do we really need larger basis sets?

Stig Rune Jensen et al. In: J. Phys. Chem. Lett. 8.7 (Apr. 6, 2017). 00003, 23
pp. 1449–1457. doi: 10.1021/acs.jpclett.7b00255
Basis Set Verification: The challenge

“We show that by choosing Gaussian


basis sets optimized for density
functional theory, basis set methods are
capable of achieving accuracy
comparable to that from the multiwavelet
approach...”

→Not necessarily, just use the right one.

Frank Jensen. In: J. Phys. Chem. A 121.32 (Aug. 17, 2017). 00002, pp. 6104–6107.
doi: 10.1021/acs.jpca.7b04760
23
Basis Set Verification: The Deltatest

∆-Gauge3
v
u Z 1.06V0,i
Eb,i (V) − Ea,i (V) dV
u 2
u
• Solid-state benchmark 2
u 0.94V
∆i (a, b) = 0,i
t
0.12V0,i
• Measure: Difference between two
Volume/Energy-curves
• 40+ “Methods”, 71 Elements (H-Rn,
elemental crystals)

Energy
• DFT, PBE Functional
• All-Electron calculations as reference

Volume
2
Kurt Lejaeghere et al. In: Science 351.6280 (Mar. 25, 2016). 00079, aad3000. doi: 10.1126/science.aad3000
3
Kurt Lejaeghere et al. In: Crit. Rev. Solid State 39.1 (Jan. 1, 2014). 00112, pp. 1–24. doi: 10.1080/10408436.2013.772503 24
Basis Set Verification: Deltatest results for CP2K’s MOLOPT Basis Set

30 30
CP2K, DZVP-SR vs Abinit CP2K, DZVP vs Abinit
CP2K, TZVP-SR vs Abinit CP2K, TZVP vs Abinit
CP2K, TZV2P-SR vs Abinit CP2K, TZV2P vs Abinit
CP2K, TZV2PX-SR vs Abinit CP2K, TZV2PX vs Abinit
Abinit Abinit
25 25

20 20
-value

-value
15 15

10 10

5 5

0 H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po Rn H C N O F Si P S Cl Cu Br 0

15 CP2K vs. WIEN2k


Abinit, GTH-PBE vs. WIEN2k

10
-value

↑ non-SR MOLOPT basis sets


5

0 H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po Rn

25
Basis Set Verification: Deltatest results for CP2K’s MOLOPT Basis Set
30
CP2K, DZVP vs Abinit
CP2K, TZVP vs Abinit
CP2K, TZV2P vs Abinit
30
CP2K, DZVP-SR vs Abinit
CP2K, TZV2PX vs Abinit CP2K, DZVP vs Abinit
30
CP2K, TZVP-SR vs Abinit
CP2K, TZV2P-SR vs Abinit
Abinit CP2K, TZVP vs Abinit
CP2K, TZV2P vs Abinit
CP2K, TZV2PX-SR vs Abinit
Abinit 25 CP2K, TZV2PX vs Abinit
Abinit
25 25

20 20

20
-value

-value
15 15

10 10

-value
15
5 5

0 H He Li Be B C N O F Ne Na Mg Al Si P larger Basis Set


S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po Rn H C N O F Si P S Cl Cu Br 0

15 CP2K vs. WIEN2k 10


Abinit, GTH-PBE vs. WIEN2k

10
-value

↑ non-SR MOLOPT basis sets


5
5

0 H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po Rn

H C N O F Si P S Cl Cu Br 0

25
Basis Set Verification: CP2K MOLOPT Deltatest Summary & Outlook

• MOLOPT basis set suitable for • Publication of complete data and


solid state calculations workflow in Discovery section of
• Larger-ζ MOLOPT basis sets http://www.materialscloud.org
systematically improve results • Testing of All-Electron Peintinger4
• Basis set related errors in same basis set in progress
order as pseudization error • More benchmarks coming
• For some elements: basis set
inadvertently compensates
pseudopotential error
4
Michael F. Peintinger et al. In: J. Comput. Chem. 34.6 (2013). 00455, pp. 451–459. doi: 10.1002/jcc.23153

26
Performance Optimisation
CP2K Timing Example
-------------------------------------------------------------------------------
- -
- T I M I N G -
- -

SUBROUTINE name contains method -------------------------------------------------------------------------------


SUBROUTINE CALLS ASD SELF TIME TOTAL TIME

and step descriptors: CP2K


MAXIMUM
1 1.0
AVERAGE
0.847
MAXIMUM AVERAGE MAXIMUM
0.890 2709.628 2709.629
qs_energies 1 2.0 0.000 0.000 2708.215 2708.215

pw Planewave scf_env_do_scf
scf_env_do_scf_inner_loop
1
630
3.0
4.0
0.001
0.038
0.002 2705.607 2705.607
0.656 2607.232 2607.239

fft Fast Fourier Transformation qs_ks_update_qs_env


rebuild_ks_matrix
651
630
5.0
6.0
0.005
0.002
0.006 1789.360 1789.383
0.002 1788.729 1788.736

mp Message Passing (MPI)


qs_ks_build_kohn_sham_matrix 630 7.0 0.080 0.088 1788.728 1788.734
pw_transfer 18285 9.3 1.136 1.356 1410.183 1433.258
fft_wrap_pw1pw2 18285 10.3 0.171 0.207 1409.047 1432.022
qs Quickstep fft_wrap_pw1pw2_400 9875 11.7 148.093 160.013 1355.429 1377.554
qs_vxc_create 630 8.0 0.010 0.012 1048.606 1049.428

scf Self-consistent field fft3d_ps


qs_rho_update_rho
18285 12.3
631 5.0
615.774
0.005
642.063 1002.533 1028.016
0.005 896.809 896.810
calculate_rho_elec 1262 6.0 19.360 65.700 896.804 896.806
ASD measure for how deeply nested a density_rs2pw 1262 7.0 0.128 0.140 815.787 829.917
xc_rho_set_and_dset_create 630 10.0 19.980 21.160 721.769 777.824

function is rs_pw_transfer
xc_vxc_pw_create
10096
210
8.7
9.0
0.142
9.400
0.176
10.387
558.560
531.224
573.389
532.044
xc_exc_calc 420 9.0 1.469 1.582 517.372 517.373
SELF TIME time spent in routine and non pw_poisson_solve 630 8.0 19.270 21.048 450.545 450.547
rs_pw_transfer_RS2PW_400 1264 9.0 185.950 197.914 263.644 280.272

seperately timed subroutines [...]


mp_alltoall_z22v 18285 14.3 173.031 215.341 173.031 215.341
mp_waitany 141448 10.7 140.483 179.715 140.483 179.715 27
pw_scatter_p 8402 13.3 168.017 174.224 168.017 174.224
CP2K Timing Guidelines

• Check that I/O routines are < 50% of total runtime


→ Remember Amdahl’s law: scaling flattens eventually
→ Do you really need to write so much/often?
→ Are you running in the right directory?
• Compare Average and Maximum values
→ large difference could mean that nodes are waiting for single rank to finish
• Check settings for respective sections
→ Are you using the right algorithms?

28
General Guidelines

• Optimization starts with you: Biggest gains by proper setup


• Cell size
• SCF settings, preconditioner
• Choice of basis set
• ADMM
• No universal recipe, check scaling of your system
• Run a small number of MD or GEO_OPT steps
• Turn off outer-SCF, keep inner-SCF fixed
• When compiling yourself:
• Use vendor-supplied BLAS, LAPACK, FFTW3 libraries
• Build and use libxsmm, ELPA
• CUDA support available, improvements are coming
29
Getting Help

https://www.cp2k.org . . . . . . . . . . . . . . . . . . . . . . . . . Exercises, Lecture Slides

https://manual.cp2k.org . . . . . . . . . . . . . . . . . . . . . . . . . . Input File reference

<CP2K-SOURCE>/tests . . . . . . . . . . . . . . . . . . . . . . . . Minimal Working Examples

https://groups.google.com/group/cp2k . . . . . . . Google Group/Forum

https://github.com/cp2k/cp2k/issues . . . . . . . . . . . . . . . . Issue Tracker

30
Thank you!

30

You might also like