Skip to content

Pickle portability little 🡒 big endian #21237

Closed
@sgundura

Description

@sgundura

Describe the bug

We are trying to load some of the AI models that are dumped in a little endian machine using joblib, on AIX which runs on power architecture which is big endian. Most of them worked, but there are 2 models which we found are giving errors. We tried to load it on a ubuntu linux that runs on power(big endian), and got the same error. We even tried building the latest nighly build of sklearn module, and still got this error. Attaching the programs we used to dump and load.
The two models that didn't work are:

  1. KNearestNeighbor
  2. RandomForest

Steps/Code to Reproduce

KNearestNeighbor_Dump.py(to be run on little endian machine):

import joblib

# Load dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier as KNN

iris = load_iris()

X = iris.data
y = iris.target

# Split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2018)

knn = KNN(n_neighbors=3)

# train model
knn.fit(X_train, y_train)
result = knn.predict(X_test)
print(result)

dump_file = "./KNearestNeighbor.joblib"
tuple_pickle = (knn, X_test, result)
with open(dump_file, 'wb') as file_dump:
    joblib.dump(tuple_pickle, file_dump)

KNearestNeighbor_Load.py(to be run on Big Endian machine after copying KNearestNeighbor.joblib file generated from above program):

import joblib
from sklearn.neighbors import KNeighborsClassifier as KNN

dump_file = "./KNearestNeighbor.joblib"
with open(dump_file, 'rb') as file_reader:
    trained_model, testing_data, original_prediction = joblib.load(file_reader)

dump_result = trained_model.predict(testing_data)
print(dump_result)
print(original_prediction)

RandomForest_Dump.py(to be run on little endian machine):
import os
import joblib
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

#Creating datasets
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.1, random_state=13)

#Training the RandomForest
rf = RandomForestClassifier()
rf.fit(X_train,y_train)

#output
result=rf.predict(X_test)
print(result)

#saving using the joblib
job_obj = (rf, X_test, result)
joblib.dump(job_obj, "./random_forest.joblib")

RandomForest_Load.py(to be run on Big Endian machine, after copying random_forest.joblib file generated from above program):
import joblib
from sklearn.model_selection import train_test_split

load_RF,X_test_job,result_orig = joblib.load("./random_forest.joblib")
load_RF.predict(X_test_job)

result_test = load_RF.predict(X_test_job)
print(result_orig)
print(result_test)

Expected Results

no error is thrown

Actual Results

python3 KNearestNeighborReadDump.py
ValueError: Little-endian buffer not supported on big-endian compiler
Exception ignored in: 'sklearn.neighbors._dist_metrics.get_vec_ptr'
ValueError: Little-endian buffer not supported on big-endian compiler
ValueError: Little-endian buffer not supported on big-endian compiler
Exception ignored in: 'sklearn.neighbors._dist_metrics.get_mat_ptr'
ValueError: Little-endian buffer not supported on big-endian compiler
Traceback (most recent call last):
  File "KNearestNeighborReadDump.py", line 6, in <module>
    trained_model, testing_data, original_prediction = joblib.load(file_reader)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 575, in load
    obj = _unpickle(fobj)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 329, in load_build
    Unpickler.load_build(self)
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1552, in load_build
    setstate(state)
  File "sklearn/neighbors/_binary_tree.pxi", line 1164, in sklearn.neighbors._kd_tree.BinaryTree.__setstate__
  File "sklearn/neighbors/_binary_tree.pxi", line 1105, in sklearn.neighbors._kd_tree.BinaryTree._update_memviews
  File "sklearn/neighbors/_binary_tree.pxi", line 204, in sklearn.neighbors._kd_tree.get_memview_DTYPE_2D
ValueError: Little-endian buffer not supported on big-endian compiler
python3 RandomForest_Load.py
/opt/freeware/lib64/python3.7/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.24.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
Traceback (most recent call last):
  File "RandomForest_Load.py", line 4, in <module>
    load_RF,X_test_job,reult_orig = joblib.load("../dumps/random_forest.joblib")
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1436, in load_reduce
    stack[-1] = func(*args)
  File "sklearn/tree/_tree.pyx", line 595, in sklearn.tree._tree.Tree.__cinit__
ValueError: Little-endian buffer not supported on big-endian compiler

Versions

System:
    python: 3.7.10 (default, Jun  1 2021, 05:23:20)  [GCC 8.3.0]
executable: /opt/freeware/bin/python3
   machine: AIX-2-00C581D74C00-powerpc-64bit-COFF

Python dependencies:
          pip: 20.1.1
   setuptools: 47.1.0
      sklearn: 0.24.2
        numpy: 1.20.3
        scipy: 1.6.3
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions