ImportError `undefined symbol: iJIT_NotifyEvent` encountered when MKL 2024.1 is installed. #123097

LiutongZhou · 2024-04-01T15:49:56Z

The bug

Importing torch raises undefined symbol: iJIT_NotifyEvent from torch/lib/libtorch_cpu.so: when pytorch and MKL 2024.1+ are installed together. Downgrading MKL to 2024.0.0 resolves it.

import torch
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
----> 1 import torch

File ~/.../lib/python3.10/site-packages/torch/__init__.py:237
    235     if USE_GLOBAL_DEPS:
    236         _load_global_deps()
--> 237     from torch._C import *  # noqa: F403
    239 # Appease the type checker; ordinarily this binding is inserted by the
    240 # torch._C module initialization code in C
    241 if TYPE_CHECKING:

ImportError: /.../lib/python3.10/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

To Reproduce

mamba create -y -n test_pytorch_mkl python=3.10 pytorch=2.2 pytorch-cuda=12.1 mkl=2024.1 -c pytorch  -c nvidia -c intel
mamba activate test_pytorch_mkl
python -c "import torch"

Versions

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.13 (tags/v3.10.13-25-g07fbd8e9251-dirty:07fbd8e9251, Sep 27 2023, 23:32:09) [GCC 13.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-174-generic-x86_64-with-glibc2.31
Is CUDA available: N/A
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB
Nvidia driver version: 525.60.13
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.6.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             32
On-line CPU(s) list:                0-31
Thread(s) per core:                 1
Core(s) per socket:                 32
Socket(s):                          1
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel Xeon Processor (Skylake)
Stepping:                           4
CPU MHz:                            2394.374
BogoMIPS:                           4788.74
Virtualization:                     VT-x
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          1 MiB
L1i cache:                          1 MiB
L2 cache:                           128 MiB
L3 cache:                           16 MiB
NUMA node0 CPU(s):                  0-31
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX flush not necessary, SMT disabled
Vulnerability Mds:                  Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:             Vulnerable
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke arch_capabilities

Versions of relevant libraries:
[pip3] torch==2.2.2
[pip3] triton==2.2.0
[conda] blas                      1.0                         mkl    intel
[conda] mkl                       2024.1.0              intel_691    intel
[conda] pytorch                   2.2.2           py3.10_cuda12.1_cudnn8.9.2_0    pytorch
[conda] pytorch-cuda              12.1                 ha16c6d3_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchtriton               2.2.0                     py310    pytorch

cc @seemethere @malfet @osalpekar @atalman

The text was updated successfully, but these errors were encountered:

min-jean-cho · 2024-04-02T20:27:59Z

cc. @CuiYifeng

This breaks the tiktorch backend, see pytorch/pytorch#123097

walidabualafia · 2024-04-04T16:37:13Z

Hi all,

I am currently hitting this bug as well. It breaks the installation of the latest tbepler/topaz. Is there an ETA for a fix?

Thank you! :)

This breaks the tiktorch backend, see pytorch/pytorch#123097

ElHouas · 2024-04-05T10:19:05Z

I am experiencing this issue as well, when installing in a docker container using conda. Here is my conda env.yaml:

name: ml_env
channels:

pytorch
nvidia
conda-forge
nodefaults
dependencies:
python=3.10.7
mamba
pip
poetry=1.6.1
pytorch::pytorch=2.0.1
pytorch::torchaudio=2.0.2
pytorch::torchvision=0.15.2
pytorch::pytorch-cuda=11.8
platforms:
linux-64

Any advice?

Thanks! :)

…torch#123097)

StefanGitHuber · 2024-04-08T14:34:31Z

I can reproduce this issue as well:

First I had the issue with
ImportError: intel_extension_for_pytorch xpu libintel-ext-pt-gpu.so: undefined symbol for _ZNK5torch8autograd4Node4nameB5cxx11Ev
Following this thread GPU examples undefined symbol intel/ipex-llm#8803 I ran
ldd /home/suhu/.local/lib/python3.10/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so
linux-vdso.so.1 (0x00007ffe60f8a000)
libtorch.so => not found
libtorch_cpu.so => not found
libc10.so => not found
libxetla_kernels.so => /home/suhu/.local/lib/python3.10/site-packages/intel_extension_for_pytorch/lib/libxetla_kernels.so (0x0000739e4f600000)
libmkl_intel_lp64.so.2 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_lp64.so.2 (0x0000739e4e000000)
libmkl_core.so.2 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_core.so.2 (0x0000739e49e00000)
libmkl_gnu_thread.so.2 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_gnu_thread.so.2 (0x0000739e48400000)
libmkl_sycl_blas.so.4 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_sycl_blas.so.4 (0x0000739e42c00000)
libmkl_sycl_lapack.so.4 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_sycl_lapack.so.4 (0x0000739e40400000)
libmkl_sycl_sparse.so.4 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_sycl_sparse.so.4 (0x0000739e39e00000)
libmkl_sycl_dft.so.4 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_sycl_dft.so.4 (0x0000739e36e00000)
libmkl_sycl_vm.so.4 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_sycl_vm.so.4 (0x0000739e2e000000)
libmkl_sycl_rng.so.4 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_sycl_rng.so.4 (0x0000739e26200000)
libmkl_sycl_stats.so.4 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_sycl_stats.so.4 (0x0000739e24200000)
libmkl_sycl_data_fitting.so.4 => /opt/intel/oneapi/mkl/2024.1/lib/libmkl_sycl_data_fitting.so.4 (0x0000739e23800000)
libze_loader.so.1 => /lib/x86_64-linux-gnu/libze_loader.so.1 (0x0000739ee9ac8000)
libOpenCL.so.1 => /opt/intel/oneapi/compiler/2024.1/opt/oclfpga/host/linux64/lib/libOpenCL.so.1 (0x0000739e23400000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000739ee9ac3000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000739ee9abe000)
libsvml.so => /opt/intel/oneapi/compiler/2024.1/lib/libsvml.so (0x0000739e21c00000)
libirng.so => /opt/intel/oneapi/compiler/2024.1/lib/libirng.so (0x0000739e49d07000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000739e21800000)
libimf.so => /opt/intel/oneapi/compiler/2024.1/lib/libimf.so (0x0000739e21200000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000739e52f19000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000739e4f5e0000)
libintlc.so.5 => /opt/intel/oneapi/compiler/2024.1/lib/libintlc.so.5 (0x0000739e4df9e000)
libsycl.so.7 => /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7 (0x0000739e20e00000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000739e20a00000)
/lib64/ld-linux-x86-64.so.2 (0x0000739ee9b45000)

pip install transformers==4.31.0
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

Now right at the beginning with
import torch
even before importing impex
import intel_extension_for_pytorch as ipex
I receive

File "~/.local/lib/python3.10/site-packages/torch/init.py", line 235, in
from torch._C import * # noqa: F403
ImportError: ~/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

Here my output from collect_env.py:

PyTorch version: N/A
PyTorch CXX11 ABI: N/A
IPEX version: N/A
IPEX commit: N/A
Build type: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: N/A
IGC version: 2024.1.0 (2024.1.0.20240308)
CMake version: version 3.26.4
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-26-generic-x86_64-with-glibc2.35
Is XPU available: N/A
DPCPP runtime version: 2024.1
MKL version: 2024.1
GPU models and configuration:
N/A
Intel OpenCL ICD version: 23.52.28202.39-82122.04
Level Zero version: 1.3.28202.39-82122.04

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core(TM) i7-12700H
CPU family: 6
Model: 154
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 1
Stepping: 3
CPU max MHz: 4700.0000
CPU min MHz: 400.0000
BogoMIPS: 5376.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr ibt flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 544 KiB (14 instances)
L1i cache: 704 KiB (14 instances)
L2 cache: 11.5 MiB (8 instances)
L3 cache: 24 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-19
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] intel-extension-for-pytorch==2.1.10+xpu
[pip3] numpy==1.26.4
[pip3] pytorch-lightning==1.9.4
[pip3] pytorch-metric-learning==1.7.3
[pip3] torch==2.1.0a0+cxx11.abi
[pip3] torch-audiomentations==0.11.0
[pip3] torch-pitch-shift==1.2.4
[pip3] torch-stft==0.1.4
[pip3] torchaudio==2.1.0.post0+cxx11.abi
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.16.0a0+cxx11.abi
[conda] N/A

@endast

* added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com>

@endast

commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com>

@endast

commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com>

@endast

commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 628af87 Author: Marcel Mück <mueckm1@gmail.com> Date: Thu Apr 4 14:09:22 2024 +0200 Update preprocessing.md (#60) Corrected small spelling mistake commit 1356ed2 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Mar 1 14:55:55 2024 +0100 Update dense_gt.py (#56) bugfix (had forgotten to remove sample_file = none) but the sample file is needed during cv training commit 4d9ef64 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Feb 23 12:21:49 2024 +0100 Feature cv training (#55) * performance optimizations * train multiple repeats on single node in parallel * bug fix * fix bug in indexing when subset_samples() removed something * sleep between jobs; stop if any job fails * format with black * bug fixes * add test for MultiphenoDataloader * update environments * uncomment rules * bug fixes * subset samples in training_dataset rule * example config.yaml * use gpu queue for compute_burdens * bugfix since dask reading didn't work any more * allow evaluation of all repeat combinations * allow analysis of each n_repeats and for all repeat combinations * option to provide burden file * allow seed gene alpha to be defined in config * change sorting order to get the best model * adaptations to analyze multiple repeats and use script wo seed genes * allow to provide a sample file and do separate indexing for pheno and geno to ensure indices are correct * automatize generation of figure 3 (associations & repliation) * generate cv splits with related samples in the same split * average burdens * average burdens * cross-validation like trainign * add missing cv_utils * write average burdens or each combination to single zarr file to avoid zarr issues * add logging information * make maf column a param * add logging * pipeline replictaion and plotting * evaluate all repeat combis with and without seed genes * update lsf.yaml * small updates * per-gene pval aggregation * aggregate pval per gene * bugfix- only load burdens if not skip burdens * logging info * updates and fixes * load burdens only for genes analysed in current chunk to save memory * small changes to pipeline * standardizing/qt-transform of combined test set x/y arrays * my_quantile_transform for numpy arrays * bugfix * remove unnecessary code * remove unnecessary wildcards * make averaging part of associate.py * allow seed genes/baselines to be missing (to allow assoc. testing for non-training phenotypes) * updates * gene-specific common variant covariates for conditional analysis * bugfix * post-hoc conditioning on common variants * restructure pipelines * removing redundant options * add cv_utils cli * simplify script (only evaluate one repeat combi/average burdens); aggregate baseline pvalues; make bonferroni correction default * removal of redundant wildcards, updates and fixes * bugfixes * baseline discoveries only required for training phenotypes * remove not needed code * update configs * formatting * manually merge changes from feature-regenie to account for gene-specific annotations * allow different sample orders in phenotype_df and genotypes.h5 * change sample ids to be bytes as it is in the real data * update pipelines * update gitignore * pipeline updates * manually update github actions to be like master * bug fixes * checkout tests from master * make phenotype indices string as they are in real data * 'add gene_id' column * manually merge with master so tests can pass * bugfixes * use gene_id column instead of gene_ids * pipeline updates and fixes * update test config * adding age2 and age_sex to example data * update config * set tests folder to main version * checkout preprocssing files from main * checkout from main * manually merge sample_id changes from main * pipeline bugfixes and renamings * fixup! Format Python code with psf/black pull_request * remove gene_ids column * integrating suggested PR changes * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ada0aaa Author: Brian Clarke <9725212+bfclarke@users.noreply.github.com> Date: Wed Feb 21 15:56:14 2024 +0100 Feature regenie (#52) * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * add function to convert REGENIE output * don't show all unmapped samples if the list is long * don't parallelize REGENIE step 1 * separate pipelines with and without REGENIE * support gene-specific annotation * bug fix * bug fix * bug fix * bug fix * correct regenie_step1 --lowmem-prefix * modify to work standalone * add --association-only option * allow gene-specific annotation * go back to SEAK/statsmodels * bug fixes * remove SAIGE code, fix imports and conda envs * make pipelines more self-contained * don't require burdens.zarr when --skip-burdens is passed * udpate utils --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de>

syedazi · 2024-04-11T09:47:02Z

Is there a resolution to this issue? I faced the same issue using conda.

conda create -n fsdp python=3.10
conda activate fsdp

# Install pytorch and other dependencies
conda install -y pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

titeup · 2024-04-12T04:07:58Z

While not having the exact explanation of the missing symbol, here is what worked for me, if it can help others.

I had the same error after installing ipex_llm.
Just painfully found my way out. Python was mixing packages from intel python in oneAPI 2024.1 and my local cache.

after loading intel env I did:
python -m pip install torch==2.1.0.post0 torchvision==0.16.0.post0 torchaudio==2.1.0.post0 intel-extension-for-pytorch==2.1.20+xpu oneccl_bind_pt==2.1.200+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

This magic combination comes from this Doc

And then (needed in my case)
pip install transformers==4.36.2

I suppose for those who want to use other python, you can load only the env variable of the MKL and use the pytorch install that worked for you before. (haven't tried it myself)

Then everything worked fine.

@endast

commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 628af87 Author: Marcel Mück <mueckm1@gmail.com> Date: Thu Apr 4 14:09:22 2024 +0200 Update preprocessing.md (#60) Corrected small spelling mistake commit 1356ed2 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Mar 1 14:55:55 2024 +0100 Update dense_gt.py (#56) bugfix (had forgotten to remove sample_file = none) but the sample file is needed during cv training commit 4d9ef64 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Feb 23 12:21:49 2024 +0100 Feature cv training (#55) * performance optimizations * train multiple repeats on single node in parallel * bug fix * fix bug in indexing when subset_samples() removed something * sleep between jobs; stop if any job fails * format with black * bug fixes * add test for MultiphenoDataloader * update environments * uncomment rules * bug fixes * subset samples in training_dataset rule * example config.yaml * use gpu queue for compute_burdens * bugfix since dask reading didn't work any more * allow evaluation of all repeat combinations * allow analysis of each n_repeats and for all repeat combinations * option to provide burden file * allow seed gene alpha to be defined in config * change sorting order to get the best model * adaptations to analyze multiple repeats and use script wo seed genes * allow to provide a sample file and do separate indexing for pheno and geno to ensure indices are correct * automatize generation of figure 3 (associations & repliation) * generate cv splits with related samples in the same split * average burdens * average burdens * cross-validation like trainign * add missing cv_utils * write average burdens or each combination to single zarr file to avoid zarr issues * add logging information * make maf column a param * add logging * pipeline replictaion and plotting * evaluate all repeat combis with and without seed genes * update lsf.yaml * small updates * per-gene pval aggregation * aggregate pval per gene * bugfix- only load burdens if not skip burdens * logging info * updates and fixes * load burdens only for genes analysed in current chunk to save memory * small changes to pipeline * standardizing/qt-transform of combined test set x/y arrays * my_quantile_transform for numpy arrays * bugfix * remove unnecessary code * remove unnecessary wildcards * make averaging part of associate.py * allow seed genes/baselines to be missing (to allow assoc. testing for non-training phenotypes) * updates * gene-specific common variant covariates for conditional analysis * bugfix * post-hoc conditioning on common variants * restructure pipelines * removing redundant options * add cv_utils cli * simplify script (only evaluate one repeat combi/average burdens); aggregate baseline pvalues; make bonferroni correction default * removal of redundant wildcards, updates and fixes * bugfixes * baseline discoveries only required for training phenotypes * remove not needed code * update configs * formatting * manually merge changes from feature-regenie to account for gene-specific annotations * allow different sample orders in phenotype_df and genotypes.h5 * change sample ids to be bytes as it is in the real data * update pipelines * update gitignore * pipeline updates * manually update github actions to be like master * bug fixes * checkout tests from master * make phenotype indices string as they are in real data * 'add gene_id' column * manually merge with master so tests can pass * bugfixes * use gene_id column instead of gene_ids * pipeline updates and fixes * update test config * adding age2 and age_sex to example data * update config * set tests folder to main version * checkout preprocssing files from main * checkout from main * manually merge sample_id changes from main * pipeline bugfixes and renamings * fixup! Format Python code with psf/black pull_request * remove gene_ids column * integrating suggested PR changes * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ada0aaa Author: Brian Clarke <9725212+bfclarke@users.noreply.github.com> Date: Wed Feb 21 15:56:14 2024 +0100 Feature regenie (#52) * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * add function to convert REGENIE output * don't show all unmapped samples if the list is long * don't parallelize REGENIE step 1 * separate pipelines with and without REGENIE * support gene-specific annotation * bug fix * bug fix * bug fix * bug fix * correct regenie_step1 --lowmem-prefix * modify to work standalone * add --association-only option * allow gene-specific annotation * go back to SEAK/statsmodels * bug fixes * remove SAIGE code, fix imports and conda envs * make pipelines more self-contained * don't require burdens.zarr when --skip-burdens is passed * udpate utils --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de>

jingxu10 · 2024-04-15T05:15:14Z

The reason is that PyTorch was built against an old version of MKL distribution which contains this symbol. However, this symbol got removed in MKL 2024.1.
The PyTorch binary released via conda channel was linked to MKL dynamically, so you got this error.
The PyTorch binary released via pip (pip install) was linked to MKL statically. You can switch to the pip install one to get rid of this error with MKL 2024.1.

@endast

commit ae5c83e Author: Marcel Mück <mueckm1@gmail.com> Date: Mon Apr 15 11:01:03 2024 +0200 fixed bugs in the annotation pipeline based on issues #61, #62 and #63. (#64) * fixed bugs in the annotation pipeline based on issues #61, #62 and #63. * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com>

StefanGitHuber · 2024-04-15T10:27:51Z

Hi Jing Xu,
I never used conda but only pip to install PyTorch. I want to use the XPU of my Intel Arc A730M, running on Ubuntu 22.04.4 LTS on my Notebook.

Here again my steps:

Install oneAPI

bash ./intelpython3-2024.1.0_814-Linux-x86_64.sh -b -u -p ~/intel/oneapi/intelpython
source ~/intel/oneapi/intelpython/env/vars.sh

Opens environment on bash:
(oneapi-intelpython)

Install PyTorch extension

github.com/intel/intel-extension-for-pytorch/tree/xpu-main
python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

import torch
import intel_extension_for_pytorch as ipex
ImportError: ~/intel/oneapi/intelpython/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

Solve _ZNK5torch8autograd4Node4nameB5cxx11Ev

https://community.intel.com/t5/Intel-Developer-Cloud/ImportError-libintel-ext-pt-gpu-so-undefined-symbol/m-p/1561667
pip install --pre --upgrade bigdl-llm[xpu_2.1] -f https://developer.intel.com/ipex-whl-stable-xpu

Solve iJIT_NotifyEvent

import torch
ImportError: ~/intel/oneapi/intelpython/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

Possibly I shouldn't perform step 3 and "upgrade" via installing the bigdl-llm[xpu_2.1] which leads me to this ImportError of iJIT_NotifyEvent. Only showing the diff from collect_env.py, it actually downgrades from before

PyTorch version: 2.2.2+cu121
PyTorch CXX11 ABI: No
[pip3] intel-extension-for-pytorch==2.1.20+xpu
[pip3] torch==2.2.2
[pip3] torchvision==0.16.0.post0+cxx11.abi
[conda] intel-extension-for-pytorch 2.1.20+xpu pypi_0 pypi
[conda] torch 2.2.2 pypi_0 pypi
[conda] torchvision 0.16.0.post0+cxx11.abi pypi_0 pypi

to

PyTorch version: N/A
PyTorch CXX11 ABI: N/A
[pip3] intel-extension-for-pytorch==2.1.10+xpu
[pip3] numpy==1.26.4
[pip3] torch==2.1.0a0+cxx11.abi
[pip3] torchaudio==2.1.0.post0+cxx11.abi
[pip3] torchvision==0.16.0a0+cxx11.abi
[conda] intel-extension-for-pytorch 2.1.10+xpu pypi_0 pypi
[conda] torch 2.1.0a0+cxx11.abi pypi_0 pypi
[conda] torchvision 0.16.0a0+cxx11.abi pypi_0 pypi

Question:
Apparently there are two torch versions, one installed via pip3 and one installed via conda. How to switch to the one installed via pip, please? I have to set an environment variable pointing to it? How exactly, please?

Thanks in advance ...

@endast

* Add new test files * Update test_preprocess.py * Use parquet * Add brians code * Update preprocess.py * sort samples * Remove threads * Update exclude calls logic * Squashed commit of the following: commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 628af87 Author: Marcel Mück <mueckm1@gmail.com> Date: Thu Apr 4 14:09:22 2024 +0200 Update preprocessing.md (#60) Corrected small spelling mistake commit 1356ed2 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Mar 1 14:55:55 2024 +0100 Update dense_gt.py (#56) bugfix (had forgotten to remove sample_file = none) but the sample file is needed during cv training commit 4d9ef64 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Feb 23 12:21:49 2024 +0100 Feature cv training (#55) * performance optimizations * train multiple repeats on single node in parallel * bug fix * fix bug in indexing when subset_samples() removed something * sleep between jobs; stop if any job fails * format with black * bug fixes * add test for MultiphenoDataloader * update environments * uncomment rules * bug fixes * subset samples in training_dataset rule * example config.yaml * use gpu queue for compute_burdens * bugfix since dask reading didn't work any more * allow evaluation of all repeat combinations * allow analysis of each n_repeats and for all repeat combinations * option to provide burden file * allow seed gene alpha to be defined in config * change sorting order to get the best model * adaptations to analyze multiple repeats and use script wo seed genes * allow to provide a sample file and do separate indexing for pheno and geno to ensure indices are correct * automatize generation of figure 3 (associations & repliation) * generate cv splits with related samples in the same split * average burdens * average burdens * cross-validation like trainign * add missing cv_utils * write average burdens or each combination to single zarr file to avoid zarr issues * add logging information * make maf column a param * add logging * pipeline replictaion and plotting * evaluate all repeat combis with and without seed genes * update lsf.yaml * small updates * per-gene pval aggregation * aggregate pval per gene * bugfix- only load burdens if not skip burdens * logging info * updates and fixes * load burdens only for genes analysed in current chunk to save memory * small changes to pipeline * standardizing/qt-transform of combined test set x/y arrays * my_quantile_transform for numpy arrays * bugfix * remove unnecessary code * remove unnecessary wildcards * make averaging part of associate.py * allow seed genes/baselines to be missing (to allow assoc. testing for non-training phenotypes) * updates * gene-specific common variant covariates for conditional analysis * bugfix * post-hoc conditioning on common variants * restructure pipelines * removing redundant options * add cv_utils cli * simplify script (only evaluate one repeat combi/average burdens); aggregate baseline pvalues; make bonferroni correction default * removal of redundant wildcards, updates and fixes * bugfixes * baseline discoveries only required for training phenotypes * remove not needed code * update configs * formatting * manually merge changes from feature-regenie to account for gene-specific annotations * allow different sample orders in phenotype_df and genotypes.h5 * change sample ids to be bytes as it is in the real data * update pipelines * update gitignore * pipeline updates * manually update github actions to be like master * bug fixes * checkout tests from master * make phenotype indices string as they are in real data * 'add gene_id' column * manually merge with master so tests can pass * bugfixes * use gene_id column instead of gene_ids * pipeline updates and fixes * update test config * adding age2 and age_sex to example data * update config * set tests folder to main version * checkout preprocssing files from main * checkout from main * manually merge sample_id changes from main * pipeline bugfixes and renamings * fixup! Format Python code with psf/black pull_request * remove gene_ids column * integrating suggested PR changes * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ada0aaa Author: Brian Clarke <9725212+bfclarke@users.noreply.github.com> Date: Wed Feb 21 15:56:14 2024 +0100 Feature regenie (#52) * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * add function to convert REGENIE output * don't show all unmapped samples if the list is long * don't parallelize REGENIE step 1 * separate pipelines with and without REGENIE * support gene-specific annotation * bug fix * bug fix * bug fix * bug fix * correct regenie_step1 --lowmem-prefix * modify to work standalone * add --association-only option * allow gene-specific annotation * go back to SEAK/statsmodels * bug fixes * remove SAIGE code, fix imports and conda envs * make pipelines more self-contained * don't require burdens.zarr when --skip-burdens is passed * udpate utils --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> * Revert "Squashed commit of the following:" This reverts commit ebde7c1. * Remove unused import * don't use mkl 2024.1.0 * update micromamba@v1.8.1 * Isolate failing test * test genotype matrix * Revert "test genotype matrix" This reverts commit 6deee9b. * Revert "Isolate failing test" This reverts commit 6a11fe3. * fixup! Format Python code with psf/black pull_request * remove files * Delete variants.tsv.gz * Update test_preprocess.py * Update test_preprocess.py * fixup! Format Python code with psf/black pull_request * Update test_preprocess.py * Update test-runner.yml * one test * Revert "one test" This reverts commit 05e4578. * Revert "Update test-runner.yml" This reverts commit ff78d30. * update call filter test data * Update expected data * Update deeprvat_preprocessing_env.yml Remove joblib * Squashed commit of the following: commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 628af87 Author: Marcel Mück <mueckm1@gmail.com> Date: Thu Apr 4 14:09:22 2024 +0200 Update preprocessing.md (#60) Corrected small spelling mistake commit 1356ed2 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Mar 1 14:55:55 2024 +0100 Update dense_gt.py (#56) bugfix (had forgotten to remove sample_file = none) but the sample file is needed during cv training commit 4d9ef64 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Feb 23 12:21:49 2024 +0100 Feature cv training (#55) * performance optimizations * train multiple repeats on single node in parallel * bug fix * fix bug in indexing when subset_samples() removed something * sleep between jobs; stop if any job fails * format with black * bug fixes * add test for MultiphenoDataloader * update environments * uncomment rules * bug fixes * subset samples in training_dataset rule * example config.yaml * use gpu queue for compute_burdens * bugfix since dask reading didn't work any more * allow evaluation of all repeat combinations * allow analysis of each n_repeats and for all repeat combinations * option to provide burden file * allow seed gene alpha to be defined in config * change sorting order to get the best model * adaptations to analyze multiple repeats and use script wo seed genes * allow to provide a sample file and do separate indexing for pheno and geno to ensure indices are correct * automatize generation of figure 3 (associations & repliation) * generate cv splits with related samples in the same split * average burdens * average burdens * cross-validation like trainign * add missing cv_utils * write average burdens or each combination to single zarr file to avoid zarr issues * add logging information * make maf column a param * add logging * pipeline replictaion and plotting * evaluate all repeat combis with and without seed genes * update lsf.yaml * small updates * per-gene pval aggregation * aggregate pval per gene * bugfix- only load burdens if not skip burdens * logging info * updates and fixes * load burdens only for genes analysed in current chunk to save memory * small changes to pipeline * standardizing/qt-transform of combined test set x/y arrays * my_quantile_transform for numpy arrays * bugfix * remove unnecessary code * remove unnecessary wildcards * make averaging part of associate.py * allow seed genes/baselines to be missing (to allow assoc. testing for non-training phenotypes) * updates * gene-specific common variant covariates for conditional analysis * bugfix * post-hoc conditioning on common variants * restructure pipelines * removing redundant options * add cv_utils cli * simplify script (only evaluate one repeat combi/average burdens); aggregate baseline pvalues; make bonferroni correction default * removal of redundant wildcards, updates and fixes * bugfixes * baseline discoveries only required for training phenotypes * remove not needed code * update configs * formatting * manually merge changes from feature-regenie to account for gene-specific annotations * allow different sample orders in phenotype_df and genotypes.h5 * change sample ids to be bytes as it is in the real data * update pipelines * update gitignore * pipeline updates * manually update github actions to be like master * bug fixes * checkout tests from master * make phenotype indices string as they are in real data * 'add gene_id' column * manually merge with master so tests can pass * bugfixes * use gene_id column instead of gene_ids * pipeline updates and fixes * update test config * adding age2 and age_sex to example data * update config * set tests folder to main version * checkout preprocssing files from main * checkout from main * manually merge sample_id changes from main * pipeline bugfixes and renamings * fixup! Format Python code with psf/black pull_request * remove gene_ids column * integrating suggested PR changes * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ada0aaa Author: Brian Clarke <9725212+bfclarke@users.noreply.github.com> Date: Wed Feb 21 15:56:14 2024 +0100 Feature regenie (#52) * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * add function to convert REGENIE output * don't show all unmapped samples if the list is long * don't parallelize REGENIE step 1 * separate pipelines with and without REGENIE * support gene-specific annotation * bug fix * bug fix * bug fix * bug fix * correct regenie_step1 --lowmem-prefix * modify to work standalone * add --association-only option * allow gene-specific annotation * go back to SEAK/statsmodels * bug fixes * remove SAIGE code, fix imports and conda envs * make pipelines more self-contained * don't require burdens.zarr when --skip-burdens is passed * udpate utils --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> * Revert change of micromamba * Ruff check * Squashed commit of the following: commit ae5c83e Author: Marcel Mück <mueckm1@gmail.com> Date: Mon Apr 15 11:01:03 2024 +0200 fixed bugs in the annotation pipeline based on issues #61, #62 and #63. (#64) * fixed bugs in the annotation pipeline based on issues #61, #62 and #63. * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> --------- Co-authored-by: PMBio <PMBio@users.noreply.github.com>

@endast

commit 24b3af5 Author: Magnus Wahlberg <endast@gmail.com> Date: Tue Apr 16 10:40:45 2024 +0200 Optimize preprocessing (#65) * Add new test files * Update test_preprocess.py * Use parquet * Add brians code * Update preprocess.py * sort samples * Remove threads * Update exclude calls logic * Squashed commit of the following: commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 628af87 Author: Marcel Mück <mueckm1@gmail.com> Date: Thu Apr 4 14:09:22 2024 +0200 Update preprocessing.md (#60) Corrected small spelling mistake commit 1356ed2 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Mar 1 14:55:55 2024 +0100 Update dense_gt.py (#56) bugfix (had forgotten to remove sample_file = none) but the sample file is needed during cv training commit 4d9ef64 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Feb 23 12:21:49 2024 +0100 Feature cv training (#55) * performance optimizations * train multiple repeats on single node in parallel * bug fix * fix bug in indexing when subset_samples() removed something * sleep between jobs; stop if any job fails * format with black * bug fixes * add test for MultiphenoDataloader * update environments * uncomment rules * bug fixes * subset samples in training_dataset rule * example config.yaml * use gpu queue for compute_burdens * bugfix since dask reading didn't work any more * allow evaluation of all repeat combinations * allow analysis of each n_repeats and for all repeat combinations * option to provide burden file * allow seed gene alpha to be defined in config * change sorting order to get the best model * adaptations to analyze multiple repeats and use script wo seed genes * allow to provide a sample file and do separate indexing for pheno and geno to ensure indices are correct * automatize generation of figure 3 (associations & repliation) * generate cv splits with related samples in the same split * average burdens * average burdens * cross-validation like trainign * add missing cv_utils * write average burdens or each combination to single zarr file to avoid zarr issues * add logging information * make maf column a param * add logging * pipeline replictaion and plotting * evaluate all repeat combis with and without seed genes * update lsf.yaml * small updates * per-gene pval aggregation * aggregate pval per gene * bugfix- only load burdens if not skip burdens * logging info * updates and fixes * load burdens only for genes analysed in current chunk to save memory * small changes to pipeline * standardizing/qt-transform of combined test set x/y arrays * my_quantile_transform for numpy arrays * bugfix * remove unnecessary code * remove unnecessary wildcards * make averaging part of associate.py * allow seed genes/baselines to be missing (to allow assoc. testing for non-training phenotypes) * updates * gene-specific common variant covariates for conditional analysis * bugfix * post-hoc conditioning on common variants * restructure pipelines * removing redundant options * add cv_utils cli * simplify script (only evaluate one repeat combi/average burdens); aggregate baseline pvalues; make bonferroni correction default * removal of redundant wildcards, updates and fixes * bugfixes * baseline discoveries only required for training phenotypes * remove not needed code * update configs * formatting * manually merge changes from feature-regenie to account for gene-specific annotations * allow different sample orders in phenotype_df and genotypes.h5 * change sample ids to be bytes as it is in the real data * update pipelines * update gitignore * pipeline updates * manually update github actions to be like master * bug fixes * checkout tests from master * make phenotype indices string as they are in real data * 'add gene_id' column * manually merge with master so tests can pass * bugfixes * use gene_id column instead of gene_ids * pipeline updates and fixes * update test config * adding age2 and age_sex to example data * update config * set tests folder to main version * checkout preprocssing files from main * checkout from main * manually merge sample_id changes from main * pipeline bugfixes and renamings * fixup! Format Python code with psf/black pull_request * remove gene_ids column * integrating suggested PR changes * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ada0aaa Author: Brian Clarke <9725212+bfclarke@users.noreply.github.com> Date: Wed Feb 21 15:56:14 2024 +0100 Feature regenie (#52) * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * add function to convert REGENIE output * don't show all unmapped samples if the list is long * don't parallelize REGENIE step 1 * separate pipelines with and without REGENIE * support gene-specific annotation * bug fix * bug fix * bug fix * bug fix * correct regenie_step1 --lowmem-prefix * modify to work standalone * add --association-only option * allow gene-specific annotation * go back to SEAK/statsmodels * bug fixes * remove SAIGE code, fix imports and conda envs * make pipelines more self-contained * don't require burdens.zarr when --skip-burdens is passed * udpate utils --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> * Revert "Squashed commit of the following:" This reverts commit ebde7c1. * Remove unused import * don't use mkl 2024.1.0 * update micromamba@v1.8.1 * Isolate failing test * test genotype matrix * Revert "test genotype matrix" This reverts commit 6deee9b. * Revert "Isolate failing test" This reverts commit 6a11fe3. * fixup! Format Python code with psf/black pull_request * remove files * Delete variants.tsv.gz * Update test_preprocess.py * Update test_preprocess.py * fixup! Format Python code with psf/black pull_request * Update test_preprocess.py * Update test-runner.yml * one test * Revert "one test" This reverts commit 05e4578. * Revert "Update test-runner.yml" This reverts commit ff78d30. * update call filter test data * Update expected data * Update deeprvat_preprocessing_env.yml Remove joblib * Squashed commit of the following: commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 628af87 Author: Marcel Mück <mueckm1@gmail.com> Date: Thu Apr 4 14:09:22 2024 +0200 Update preprocessing.md (#60) Corrected small spelling mistake commit 1356ed2 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Mar 1 14:55:55 2024 +0100 Update dense_gt.py (#56) bugfix (had forgotten to remove sample_file = none) but the sample file is needed during cv training commit 4d9ef64 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Feb 23 12:21:49 2024 +0100 Feature cv training (#55) * performance optimizations * train multiple repeats on single node in parallel * bug fix * fix bug in indexing when subset_samples() removed something * sleep between jobs; stop if any job fails * format with black * bug fixes * add test for MultiphenoDataloader * update environments * uncomment rules * bug fixes * subset samples in training_dataset rule * example config.yaml * use gpu queue for compute_burdens * bugfix since dask reading didn't work any more * allow evaluation of all repeat combinations * allow analysis of each n_repeats and for all repeat combinations * option to provide burden file * allow seed gene alpha to be defined in config * change sorting order to get the best model * adaptations to analyze multiple repeats and use script wo seed genes * allow to provide a sample file and do separate indexing for pheno and geno to ensure indices are correct * automatize generation of figure 3 (associations & repliation) * generate cv splits with related samples in the same split * average burdens * average burdens * cross-validation like trainign * add missing cv_utils * write average burdens or each combination to single zarr file to avoid zarr issues * add logging information * make maf column a param * add logging * pipeline replictaion and plotting * evaluate all repeat combis with and without seed genes * update lsf.yaml * small updates * per-gene pval aggregation * aggregate pval per gene * bugfix- only load burdens if not skip burdens * logging info * updates and fixes * load burdens only for genes analysed in current chunk to save memory * small changes to pipeline * standardizing/qt-transform of combined test set x/y arrays * my_quantile_transform for numpy arrays * bugfix * remove unnecessary code * remove unnecessary wildcards * make averaging part of associate.py * allow seed genes/baselines to be missing (to allow assoc. testing for non-training phenotypes) * updates * gene-specific common variant covariates for conditional analysis * bugfix * post-hoc conditioning on common variants * restructure pipelines * removing redundant options * add cv_utils cli * simplify script (only evaluate one repeat combi/average burdens); aggregate baseline pvalues; make bonferroni correction default * removal of redundant wildcards, updates and fixes * bugfixes * baseline discoveries only required for training phenotypes * remove not needed code * update configs * formatting * manually merge changes from feature-regenie to account for gene-specific annotations * allow different sample orders in phenotype_df and genotypes.h5 * change sample ids to be bytes as it is in the real data * update pipelines * update gitignore * pipeline updates * manually update github actions to be like master * bug fixes * checkout tests from master * make phenotype indices string as they are in real data * 'add gene_id' column * manually merge with master so tests can pass * bugfixes * use gene_id column instead of gene_ids * pipeline updates and fixes * update test config * adding age2 and age_sex to example data * update config * set tests folder to main version * checkout preprocssing files from main * checkout from main * manually merge sample_id changes from main * pipeline bugfixes and renamings * fixup! Format Python code with psf/black pull_request * remove gene_ids column * integrating suggested PR changes * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ada0aaa Author: Brian Clarke <9725212+bfclarke@users.noreply.github.com> Date: Wed Feb 21 15:56:14 2024 +0100 Feature regenie (#52) * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * add function to convert REGENIE output * don't show all unmapped samples if the list is long * don't parallelize REGENIE step 1 * separate pipelines with and without REGENIE * support gene-specific annotation * bug fix * bug fix * bug fix * bug fix * correct regenie_step1 --lowmem-prefix * modify to work standalone * add --association-only option * allow gene-specific annotation * go back to SEAK/statsmodels * bug fixes * remove SAIGE code, fix imports and conda envs * make pipelines more self-contained * don't require burdens.zarr when --skip-burdens is passed * udpate utils --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> * Revert change of micromamba * Ruff check * Squashed commit of the following: commit ae5c83e Author: Marcel Mück <mueckm1@gmail.com> Date: Mon Apr 15 11:01:03 2024 +0200 fixed bugs in the annotation pipeline based on issues #61, #62 and #63. (#64) * fixed bugs in the annotation pipeline based on issues #61, #62 and #63. * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> --------- Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ae5c83e Author: Marcel Mück <mueckm1@gmail.com> Date: Mon Apr 15 11:01:03 2024 +0200 fixed bugs in the annotation pipeline based on issues #61, #62 and #63. (#64) * fixed bugs in the annotation pipeline based on issues #61, #62 and #63. * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com>

@endast

* add qc_indmiss * Update preprocess_with_qc.snakefile * Fix csv * add process_individual_missingness cmd * add process_individual_missingness * Use separate variable for sample_path * Only write sample to indmiss file * add test_process_individual_missingness tests * Add sample missingness to workflow * Update dag images in doc * Update test_preprocess.py * add back create_excluded_samples_dir * Cleanup pipeline * fixup! Format Python code with psf/black pull_request * Update preprocess.py * fixup! Format Python code with psf/black pull_request * Fix ruff errors * Squashed commit of the following: commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> * Squashed commit of the following: commit ae5c83e Author: Marcel Mück <mueckm1@gmail.com> Date: Mon Apr 15 11:01:03 2024 +0200 fixed bugs in the annotation pipeline based on issues #61, #62 and #63. (#64) * fixed bugs in the annotation pipeline based on issues #61, #62 and #63. * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> * Squashed commit of the following: commit 24b3af5 Author: Magnus Wahlberg <endast@gmail.com> Date: Tue Apr 16 10:40:45 2024 +0200 Optimize preprocessing (#65) * Add new test files * Update test_preprocess.py * Use parquet * Add brians code * Update preprocess.py * sort samples * Remove threads * Update exclude calls logic * Squashed commit of the following: commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 628af87 Author: Marcel Mück <mueckm1@gmail.com> Date: Thu Apr 4 14:09:22 2024 +0200 Update preprocessing.md (#60) Corrected small spelling mistake commit 1356ed2 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Mar 1 14:55:55 2024 +0100 Update dense_gt.py (#56) bugfix (had forgotten to remove sample_file = none) but the sample file is needed during cv training commit 4d9ef64 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Feb 23 12:21:49 2024 +0100 Feature cv training (#55) * performance optimizations * train multiple repeats on single node in parallel * bug fix * fix bug in indexing when subset_samples() removed something * sleep between jobs; stop if any job fails * format with black * bug fixes * add test for MultiphenoDataloader * update environments * uncomment rules * bug fixes * subset samples in training_dataset rule * example config.yaml * use gpu queue for compute_burdens * bugfix since dask reading didn't work any more * allow evaluation of all repeat combinations * allow analysis of each n_repeats and for all repeat combinations * option to provide burden file * allow seed gene alpha to be defined in config * change sorting order to get the best model * adaptations to analyze multiple repeats and use script wo seed genes * allow to provide a sample file and do separate indexing for pheno and geno to ensure indices are correct * automatize generation of figure 3 (associations & repliation) * generate cv splits with related samples in the same split * average burdens * average burdens * cross-validation like trainign * add missing cv_utils * write average burdens or each combination to single zarr file to avoid zarr issues * add logging information * make maf column a param * add logging * pipeline replictaion and plotting * evaluate all repeat combis with and without seed genes * update lsf.yaml * small updates * per-gene pval aggregation * aggregate pval per gene * bugfix- only load burdens if not skip burdens * logging info * updates and fixes * load burdens only for genes analysed in current chunk to save memory * small changes to pipeline * standardizing/qt-transform of combined test set x/y arrays * my_quantile_transform for numpy arrays * bugfix * remove unnecessary code * remove unnecessary wildcards * make averaging part of associate.py * allow seed genes/baselines to be missing (to allow assoc. testing for non-training phenotypes) * updates * gene-specific common variant covariates for conditional analysis * bugfix * post-hoc conditioning on common variants * restructure pipelines * removing redundant options * add cv_utils cli * simplify script (only evaluate one repeat combi/average burdens); aggregate baseline pvalues; make bonferroni correction default * removal of redundant wildcards, updates and fixes * bugfixes * baseline discoveries only required for training phenotypes * remove not needed code * update configs * formatting * manually merge changes from feature-regenie to account for gene-specific annotations * allow different sample orders in phenotype_df and genotypes.h5 * change sample ids to be bytes as it is in the real data * update pipelines * update gitignore * pipeline updates * manually update github actions to be like master * bug fixes * checkout tests from master * make phenotype indices string as they are in real data * 'add gene_id' column * manually merge with master so tests can pass * bugfixes * use gene_id column instead of gene_ids * pipeline updates and fixes * update test config * adding age2 and age_sex to example data * update config * set tests folder to main version * checkout preprocssing files from main * checkout from main * manually merge sample_id changes from main * pipeline bugfixes and renamings * fixup! Format Python code with psf/black pull_request * remove gene_ids column * integrating suggested PR changes * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ada0aaa Author: Brian Clarke <9725212+bfclarke@users.noreply.github.com> Date: Wed Feb 21 15:56:14 2024 +0100 Feature regenie (#52) * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * add function to convert REGENIE output * don't show all unmapped samples if the list is long * don't parallelize REGENIE step 1 * separate pipelines with and without REGENIE * support gene-specific annotation * bug fix * bug fix * bug fix * bug fix * correct regenie_step1 --lowmem-prefix * modify to work standalone * add --association-only option * allow gene-specific annotation * go back to SEAK/statsmodels * bug fixes * remove SAIGE code, fix imports and conda envs * make pipelines more self-contained * don't require burdens.zarr when --skip-burdens is passed * udpate utils --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> * Revert "Squashed commit of the following:" This reverts commit ebde7c1. * Remove unused import * don't use mkl 2024.1.0 * update micromamba@v1.8.1 * Isolate failing test * test genotype matrix * Revert "test genotype matrix" This reverts commit 6deee9b. * Revert "Isolate failing test" This reverts commit 6a11fe3. * fixup! Format Python code with psf/black pull_request * remove files * Delete variants.tsv.gz * Update test_preprocess.py * Update test_preprocess.py * fixup! Format Python code with psf/black pull_request * Update test_preprocess.py * Update test-runner.yml * one test * Revert "one test" This reverts commit 05e4578. * Revert "Update test-runner.yml" This reverts commit ff78d30. * update call filter test data * Update expected data * Update deeprvat_preprocessing_env.yml Remove joblib * Squashed commit of the following: commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 628af87 Author: Marcel Mück <mueckm1@gmail.com> Date: Thu Apr 4 14:09:22 2024 +0200 Update preprocessing.md (#60) Corrected small spelling mistake commit 1356ed2 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Mar 1 14:55:55 2024 +0100 Update dense_gt.py (#56) bugfix (had forgotten to remove sample_file = none) but the sample file is needed during cv training commit 4d9ef64 Author: Eva Holtkamp <59055511+HolEv@users.noreply.github.com> Date: Fri Feb 23 12:21:49 2024 +0100 Feature cv training (#55) * performance optimizations * train multiple repeats on single node in parallel * bug fix * fix bug in indexing when subset_samples() removed something * sleep between jobs; stop if any job fails * format with black * bug fixes * add test for MultiphenoDataloader * update environments * uncomment rules * bug fixes * subset samples in training_dataset rule * example config.yaml * use gpu queue for compute_burdens * bugfix since dask reading didn't work any more * allow evaluation of all repeat combinations * allow analysis of each n_repeats and for all repeat combinations * option to provide burden file * allow seed gene alpha to be defined in config * change sorting order to get the best model * adaptations to analyze multiple repeats and use script wo seed genes * allow to provide a sample file and do separate indexing for pheno and geno to ensure indices are correct * automatize generation of figure 3 (associations & repliation) * generate cv splits with related samples in the same split * average burdens * average burdens * cross-validation like trainign * add missing cv_utils * write average burdens or each combination to single zarr file to avoid zarr issues * add logging information * make maf column a param * add logging * pipeline replictaion and plotting * evaluate all repeat combis with and without seed genes * update lsf.yaml * small updates * per-gene pval aggregation * aggregate pval per gene * bugfix- only load burdens if not skip burdens * logging info * updates and fixes * load burdens only for genes analysed in current chunk to save memory * small changes to pipeline * standardizing/qt-transform of combined test set x/y arrays * my_quantile_transform for numpy arrays * bugfix * remove unnecessary code * remove unnecessary wildcards * make averaging part of associate.py * allow seed genes/baselines to be missing (to allow assoc. testing for non-training phenotypes) * updates * gene-specific common variant covariates for conditional analysis * bugfix * post-hoc conditioning on common variants * restructure pipelines * removing redundant options * add cv_utils cli * simplify script (only evaluate one repeat combi/average burdens); aggregate baseline pvalues; make bonferroni correction default * removal of redundant wildcards, updates and fixes * bugfixes * baseline discoveries only required for training phenotypes * remove not needed code * update configs * formatting * manually merge changes from feature-regenie to account for gene-specific annotations * allow different sample orders in phenotype_df and genotypes.h5 * change sample ids to be bytes as it is in the real data * update pipelines * update gitignore * pipeline updates * manually update github actions to be like master * bug fixes * checkout tests from master * make phenotype indices string as they are in real data * 'add gene_id' column * manually merge with master so tests can pass * bugfixes * use gene_id column instead of gene_ids * pipeline updates and fixes * update test config * adding age2 and age_sex to example data * update config * set tests folder to main version * checkout preprocssing files from main * checkout from main * manually merge sample_id changes from main * pipeline bugfixes and renamings * fixup! Format Python code with psf/black pull_request * remove gene_ids column * integrating suggested PR changes * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ada0aaa Author: Brian Clarke <9725212+bfclarke@users.noreply.github.com> Date: Wed Feb 21 15:56:14 2024 +0100 Feature regenie (#52) * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * convert burdens and phenotypes to SAIGE format * add function to make regenie input * modifications for regenie * bug fixes * update to use regenie * add function for mapping samples * implement burden export * add function to convert REGENIE output * don't show all unmapped samples if the list is long * don't parallelize REGENIE step 1 * separate pipelines with and without REGENIE * support gene-specific annotation * bug fix * bug fix * bug fix * bug fix * correct regenie_step1 --lowmem-prefix * modify to work standalone * add --association-only option * allow gene-specific annotation * go back to SEAK/statsmodels * bug fixes * remove SAIGE code, fix imports and conda envs * make pipelines more self-contained * don't require burdens.zarr when --skip-burdens is passed * udpate utils --------- Co-authored-by: Brian Clarke <brian.clarke@dkfz.de> * Revert change of micromamba * Ruff check * Squashed commit of the following: commit ae5c83e Author: Marcel Mück <mueckm1@gmail.com> Date: Mon Apr 15 11:01:03 2024 +0200 fixed bugs in the annotation pipeline based on issues #61, #62 and #63. (#64) * fixed bugs in the annotation pipeline based on issues #61, #62 and #63. * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> --------- Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit ae5c83e Author: Marcel Mück <mueckm1@gmail.com> Date: Mon Apr 15 11:01:03 2024 +0200 fixed bugs in the annotation pipeline based on issues #61, #62 and #63. (#64) * fixed bugs in the annotation pipeline based on issues #61, #62 and #63. * fixup! Format Python code with psf/black pull_request --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> commit 101feb2 Author: Marcel Mück <mueckm1@gmail.com> Date: Tue Apr 9 11:56:54 2024 +0200 Annotations new features (#54) * added all changes from annotation-speedups branch * added gtf and genotype mock file for github tests * Delete example/annotations/preprocessing_workdir/preprocessed directory * Update annotation_colnames_filling_values.yaml * Corrected fill values for maf columns * Changed protein_id merging and exon distance filtering, s.t. no annotations are dropped * included rulegraph instead dag * based on suggestions from @endast * added version info for rockdb.yaml file * updated rulegraph Updated Documentation corrected nonfunctional links * added support for X/Y chromosomes, removed dependency on pvcf file * excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/pytorch#123097) * changed way file stems are assumed to include 'double ending' on input files. * removed unused lines, removed pvcf from config file * changed if statement for gene_id_file --------- Co-authored-by: “Marcel-Mueck” <“mueckm1@gmail.com”> Co-authored-by: PMBio <PMBio@users.noreply.github.com> * Revert "Squashed commit of the following:" This reverts commit 4e9b47d. --------- Co-authored-by: PMBio <PMBio@users.noreply.github.com>

LiutongZhou · 2024-08-09T20:27:45Z

pytorch/builder#1914 seems trying supporting MKL 2024.2 for conda install

yanbing-j · 2024-08-12T01:11:04Z

pytorch/builder#1914 seems trying supporting MKL 2024.2 for conda install

@xuhancn May I know the current status of this PR?

BashMocha · 2024-08-21T10:45:36Z

Seems to be mkl indeed. All it takes is (here with mamba, but conda would work too) this and then you can import torch successfully: mamba install mkl==2024.0

Worked for me as well.
conda install mkl==2024.0

thak123 · 2024-08-21T13:38:47Z

Yes for me as well mkl==2024.0 worked

pytorch/pytorch#123097

HyeJu99 · 2024-11-11T09:41:40Z

Seems to be mkl indeed. All it takes is (here with mamba, but conda would work too) this and then you can import torch successfully: mamba install mkl==2024.0

Worked for me as well. conda install mkl==2024.0

For me as well. conda install mkl==2024.0 worked, thank you : )

My environments are here:

OS: Ubuntu22.04LTS, Virtual Env: Conda-forge, Python: 3.7.12
GPU: Nvidia RTX4090 (Driver 550), CUDA: 11.8, cuDNN: 8.9.7, PyTorch: 1.13.1

richardrl · 2024-11-15T06:35:43Z

Still an issue with pytorch1.13.1 / cuda 11.7 and mkl 2024.2.2.

downgrading to 2024.0 fixed it

I think it's a small bug due to the Pytorch itself, that by following the recommended installation order, without `conda install mkl==2024.0` before mmcv installation, it causes an error called `undefined symbol: iJIT_NotifyEvent`. referance:pytorch/pytorch#123097

OrangeSodahub · 2025-01-07T12:53:50Z

Solved!

CamiloMartinezM · 2025-03-10T14:28:00Z

I am still getting this error with the following packages:

  blas                       2.131         mkl                          conda-forge
  blas-devel                 3.9.0         31_hcf00494_mkl              conda-forge
  libblas                    3.9.0         31_hfdb39a5_mkl              conda-forge
  libcblas                   3.9.0         31_h372d94f_mkl              conda-forge
  liblapack                  3.9.0         31_hc41d3b0_mkl              conda-forge
  liblapacke                 3.9.0         31_hbc6e62b_mkl              conda-forge
  mkl                        2024.2.2      ha957f24_16                  conda-forge
  mkl-devel                  2024.2.2      ha770c72_16                  conda-forge
  mkl-include                2024.2.2      ha957f24_16                  conda-forge

yanbing-j · 2025-03-11T02:34:16Z

This ImportError is caused by mismatched MKL version in compile and runtime during <=MKL 2024.0 and MKL 2024.1. That is to say, the MKL version in PyTorch official wheel is different between the MKL version in user's environment.

If you want to use >=MKL 2024.1, please use PyTorch wheel >= 2.5. PyTorch <=2.4 can also use MKL >=2024.1 only when user builds from source.

The following is the detailed information of MKL in PyTorch 2.4, 2.5 and 2.6:

PyTorch official build with MKL 2024.2 started from https://github.com/pytorch/pytorch/pull/129022/files#diff-4e361b86d9119697c4b663cb1f905c0ee6be13f2e97b3fcc4146e55d5633e9e7R5, which is included since PyTorch 2.5. I did some tests using PyTorch 2.4 ~ 2.6 wheel as follows:

conda create -n mkl_test python=3.10
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #torch 2.6.0+cpu

>>> import torch
>>> torch.__config__.parallel_info()
'ATen/Parallel:\n\tat::get_num_threads() : 96\n\tat::get_num_interop_threads() : 96\nOpenMP 201511 (a.k.a. OpenMP 4.5)\n\tomp_get_max_threads() : 96\nIntel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications\n\tmkl_get_max_threads() : 96\nIntel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\nstd::thread::hardware_concurrency() : 192\nEnvironment variables:\n\tOMP_NUM_THREADS : [not set]\n\tMKL_NUM_THREADS : [not set]\nATen parallel backend: OpenMP\n'

pip install https://download.pytorch.org/whl/cpu/torch-2.5.0%2Bcpu-cp310-cp310-linux_x86_64.whl#sha256=7458180f01525424f8015dcb6051b8233fcf65966697b66f7b732c8a9aa0384f
>>> import torch
>>> torch.__config__.parallel_info()
'ATen/Parallel:\n\tat::get_num_threads() : 96\n\tat::get_num_interop_threads() : 96\nOpenMP 201511 (a.k.a. OpenMP 4.5)\n\tomp_get_max_threads() : 96\nIntel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications\n\tmkl_get_max_threads() : 96\nIntel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\nstd::thread::hardware_concurrency() : 192\nEnvironment variables:\n\tOMP_NUM_THREADS : [not set]\n\tMKL_NUM_THREADS : [not set]\nATen parallel backend: OpenMP\n'

pip install https://download.pytorch.org/whl/cpu/torch-2.4.0%2Bcpu-cp310-cp310-linux_x86_64.whl#sha256=0e59377b27823dda6d26528febb7ca06fc5b77816eaa58b4420cc8785e33d4ce
>>> import torch
>>> torch.__config__.parallel_info()
'ATen/Parallel:\n\tat::get_num_threads() : 96\n\tat::get_num_interop_threads() : 96\nOpenMP 201511 (a.k.a. OpenMP 4.5)\n\tomp_get_max_threads() : 96\nIntel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n\tmkl_get_max_threads() : 96\nIntel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)\nstd::thread::hardware_concurrency() : 192\nEnvironment variables:\n\tOMP_NUM_THREADS : [not set]\n\tMKL_NUM_THREADS : [not set]\nATen parallel backend: OpenMP\n'

Therefore, PyTorch starting from 2.5 is linked to MKl 2024.2. For PyTorch wheel before 2.4, MKL should downgrade to 2024.0 to avoid the ImportError.

yanbing-j · 2025-03-11T02:35:03Z

@LiutongZhou Can this issue be closed? Starting from PyTorch 2.5, oneMKL 2024.2 is supported.

yanbing-j · 2025-03-11T02:36:33Z

@CamiloMartinezM Please check your PyTorch version, if you want to use oneMKL 2024.2, you can turn to PyTorch 2.5 wheel instead.

yanbing-j · 2025-03-11T02:37:51Z

Still an issue with pytorch1.13.1 / cuda 11.7 and mkl 2024.2.2.

downgrading to 2024.0 fixed it

@richardrl Could you please try PyTorch 2.5 or PyTorch 2.6 wheel instead? They are built with oneMKL 2024.2.

LiutongZhou · 2025-03-11T20:42:46Z

@yanbing-j I confirm MKL 2024.2+ is supported by Pytorch 2.5+. This issue is resolved and can be closed.

LiutongZhou changed the title ~~undefined symbol: iJIT_NotifyEvent encountered when MKL 2024.1 is installed.~~ ImportError undefined symbol: iJIT_NotifyEvent encountered when MKL 2024.1 is installed. Apr 1, 2024

janeyx99 added module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mkl Related to our MKL support labels Apr 1, 2024

btbest added a commit to btbest/ilastik that referenced this issue Apr 4, 2024

Exclude mkl 2024.1

9d56884

This breaks the tiktorch backend, see pytorch/pytorch#123097

shaneknapp mentioned this issue Apr 4, 2024

Pytorch not importing berkeley-dsep-infra/datahub#5659

Closed

btbest added a commit to btbest/ilastik that referenced this issue Apr 5, 2024

Exclude mkl 2024.1

7ca646c

This breaks the tiktorch backend, see pytorch/pytorch#123097

Marcel-Mueck pushed a commit to PMBio/deeprvat that referenced this issue Apr 5, 2024

excluded mkl version 2024.1.0 since it is crashing pytorch(pytorch/py…

58c4769

…torch#123097)

jostorge mentioned this issue Apr 11, 2024

error loading pytorch jostorge/diffusion-hopping#4

Closed

tingyu66 mentioned this issue Apr 11, 2024

Fix CI issue in building docs rapidsai/cugraph#4336

Closed

sambartik mentioned this issue Apr 12, 2024

Intel ARC Support oobabooga/text-generation-webui#1575

Open

levnikolaevich mentioned this issue Apr 12, 2024

Problems with dependencies and environment River-Zhang/SIFU#20

Closed

NineMeowICT mentioned this issue Apr 12, 2024

ImportError: /home/gta/miniconda3/envs/llm/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent intel/ipex-llm#10550

Open

MKCarter mentioned this issue Aug 13, 2024

Make environment fails zrqiao/NeuralPLexer#1

Closed

mmarinriera mentioned this issue Sep 11, 2024

Package installation is broken in Ubuntu 20.24 TuragaLab/DECODE#237

Open

zhaojinjian0000 mentioned this issue Sep 12, 2024

ImportError when installing simple-knn and diff-gaussian-rasterization-confidence packages ali-vilab/Infusion#7

Closed

ricardoV94 added a commit to ricardoV94/pytensor that referenced this issue Oct 2, 2024

CI: Torch not compatible with recent mkl

3532922

pytorch/pytorch#123097

ricardoV94 added a commit to ricardoV94/pytensor that referenced this issue Oct 2, 2024

CI: Torch not compatible with recent mkl

a9ed0f9

pytorch/pytorch#123097

ricardoV94 added a commit to pymc-devs/pytensor that referenced this issue Oct 2, 2024

CI: Torch not compatible with recent mkl

8a6e407

pytorch/pytorch#123097

QuentinVitt mentioned this issue Oct 4, 2024

ModuleNotFoundError: No module named 'vllm.distributed' intel/ipex-llm#12151

Open

yoshipon mentioned this issue Oct 31, 2024

Migration from Anaconda to conda-forge espnet/espnet#5924

Merged

Ch0ronomato pushed a commit to Ch0ronomato/pytensor that referenced this issue Nov 2, 2024

CI: Torch not compatible with recent mkl

8a9572b

pytorch/pytorch#123097

PatWalters mentioned this issue Nov 17, 2024

Error executing 01_profile.ipynb coleygroup/shepherd-score#1

Closed

nblumoe mentioned this issue Nov 18, 2024

mkl package not found PetervanLunteren/AddaxAI#56

Closed

FreeWillThorn mentioned this issue Dec 27, 2024

Update README.md open-mmlab/mmyolo#1044

Open

enmarchi mentioned this issue Jan 13, 2025

Errors when installing the project: package python-3.7.13-haa1d7c7_1 is excluded by strict repo priority, and m-kruse98/SplatPose#6

Closed

ThomasBaruzier mentioned this issue Mar 5, 2025

[Bug] Newly added IQ1-2 quants produces garbage output using Unsloth quants and default optimize rules kvcache-ai/ktransformers#782

Open

5 tasks

LiutongZhou closed this as completed Mar 11, 2025

rsxdalv mentioned this issue Apr 3, 2025

undefined symbol: iJIT_NotifyEvent rsxdalv/tts-generation-webui#471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImportError `undefined symbol: iJIT_NotifyEvent` encountered when MKL 2024.1 is installed. #123097

ImportError `undefined symbol: iJIT_NotifyEvent` encountered when MKL 2024.1 is installed. #123097

LiutongZhou commented Apr 1, 2024 •

edited

Loading

min-jean-cho commented Apr 2, 2024

walidabualafia commented Apr 4, 2024

ElHouas commented Apr 5, 2024 •

edited

Loading

StefanGitHuber commented Apr 8, 2024

syedazi commented Apr 11, 2024 •

edited

Loading

titeup commented Apr 12, 2024

jingxu10 commented Apr 15, 2024

StefanGitHuber commented Apr 15, 2024

LiutongZhou commented Aug 9, 2024 •

edited

Loading

yanbing-j commented Aug 12, 2024

BashMocha commented Aug 21, 2024

thak123 commented Aug 21, 2024

HyeJu99 commented Nov 11, 2024 •

edited

Loading

richardrl commented Nov 15, 2024

OrangeSodahub commented Jan 7, 2025

CamiloMartinezM commented Mar 10, 2025

yanbing-j commented Mar 11, 2025

yanbing-j commented Mar 11, 2025

yanbing-j commented Mar 11, 2025

yanbing-j commented Mar 11, 2025

LiutongZhou commented Mar 11, 2025 •

edited

Loading

ImportError undefined symbol: iJIT_NotifyEvent encountered when MKL 2024.1 is installed. #123097

ImportError undefined symbol: iJIT_NotifyEvent encountered when MKL 2024.1 is installed. #123097

Comments

LiutongZhou commented Apr 1, 2024 • edited Loading

The bug

To Reproduce

Versions

min-jean-cho commented Apr 2, 2024

walidabualafia commented Apr 4, 2024

ElHouas commented Apr 5, 2024 • edited Loading

StefanGitHuber commented Apr 8, 2024

syedazi commented Apr 11, 2024 • edited Loading

titeup commented Apr 12, 2024

jingxu10 commented Apr 15, 2024

StefanGitHuber commented Apr 15, 2024

LiutongZhou commented Aug 9, 2024 • edited Loading

yanbing-j commented Aug 12, 2024

BashMocha commented Aug 21, 2024

thak123 commented Aug 21, 2024

HyeJu99 commented Nov 11, 2024 • edited Loading

richardrl commented Nov 15, 2024

OrangeSodahub commented Jan 7, 2025

CamiloMartinezM commented Mar 10, 2025

yanbing-j commented Mar 11, 2025

yanbing-j commented Mar 11, 2025

yanbing-j commented Mar 11, 2025

yanbing-j commented Mar 11, 2025

LiutongZhou commented Mar 11, 2025 • edited Loading

ImportError `undefined symbol: iJIT_NotifyEvent` encountered when MKL 2024.1 is installed. #123097

ImportError `undefined symbol: iJIT_NotifyEvent` encountered when MKL 2024.1 is installed. #123097

LiutongZhou commented Apr 1, 2024 •

edited

Loading

ElHouas commented Apr 5, 2024 •

edited

Loading

syedazi commented Apr 11, 2024 •

edited

Loading

LiutongZhou commented Aug 9, 2024 •

edited

Loading

HyeJu99 commented Nov 11, 2024 •

edited

Loading

LiutongZhou commented Mar 11, 2025 •

edited

Loading