Closed
Description
Describe the bug
We are trying to load some of the AI models that are dumped in a little endian machine using joblib, on AIX which runs on power architecture which is big endian. Most of them worked, but there are 2 models which we found are giving errors. We tried to load it on a ubuntu linux that runs on power(big endian), and got the same error. We even tried building the latest nighly build of sklearn module, and still got this error. Attaching the programs we used to dump and load.
The two models that didn't work are:
- KNearestNeighbor
- RandomForest
Steps/Code to Reproduce
KNearestNeighbor_Dump.py(to be run on little endian machine):
import joblib
# Load dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier as KNN
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2018)
knn = KNN(n_neighbors=3)
# train model
knn.fit(X_train, y_train)
result = knn.predict(X_test)
print(result)
dump_file = "./KNearestNeighbor.joblib"
tuple_pickle = (knn, X_test, result)
with open(dump_file, 'wb') as file_dump:
joblib.dump(tuple_pickle, file_dump)
KNearestNeighbor_Load.py(to be run on Big Endian machine after copying KNearestNeighbor.joblib file generated from above program):
import joblib
from sklearn.neighbors import KNeighborsClassifier as KNN
dump_file = "./KNearestNeighbor.joblib"
with open(dump_file, 'rb') as file_reader:
trained_model, testing_data, original_prediction = joblib.load(file_reader)
dump_result = trained_model.predict(testing_data)
print(dump_result)
print(original_prediction)
RandomForest_Dump.py(to be run on little endian machine):
import os
import joblib
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
#Creating datasets
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.1, random_state=13)
#Training the RandomForest
rf = RandomForestClassifier()
rf.fit(X_train,y_train)
#output
result=rf.predict(X_test)
print(result)
#saving using the joblib
job_obj = (rf, X_test, result)
joblib.dump(job_obj, "./random_forest.joblib")
RandomForest_Load.py(to be run on Big Endian machine, after copying random_forest.joblib file generated from above program):
import joblib
from sklearn.model_selection import train_test_split
load_RF,X_test_job,result_orig = joblib.load("./random_forest.joblib")
load_RF.predict(X_test_job)
result_test = load_RF.predict(X_test_job)
print(result_orig)
print(result_test)
Expected Results
no error is thrown
Actual Results
python3 KNearestNeighborReadDump.py
ValueError: Little-endian buffer not supported on big-endian compiler
Exception ignored in: 'sklearn.neighbors._dist_metrics.get_vec_ptr'
ValueError: Little-endian buffer not supported on big-endian compiler
ValueError: Little-endian buffer not supported on big-endian compiler
Exception ignored in: 'sklearn.neighbors._dist_metrics.get_mat_ptr'
ValueError: Little-endian buffer not supported on big-endian compiler
Traceback (most recent call last):
File "KNearestNeighborReadDump.py", line 6, in <module>
trained_model, testing_data, original_prediction = joblib.load(file_reader)
File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 575, in load
obj = _unpickle(fobj)
File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/opt/freeware/lib64/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 329, in load_build
Unpickler.load_build(self)
File "/opt/freeware/lib64/python3.7/pickle.py", line 1552, in load_build
setstate(state)
File "sklearn/neighbors/_binary_tree.pxi", line 1164, in sklearn.neighbors._kd_tree.BinaryTree.__setstate__
File "sklearn/neighbors/_binary_tree.pxi", line 1105, in sklearn.neighbors._kd_tree.BinaryTree._update_memviews
File "sklearn/neighbors/_binary_tree.pxi", line 204, in sklearn.neighbors._kd_tree.get_memview_DTYPE_2D
ValueError: Little-endian buffer not supported on big-endian compiler
python3 RandomForest_Load.py
/opt/freeware/lib64/python3.7/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.24.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
Traceback (most recent call last):
File "RandomForest_Load.py", line 4, in <module>
load_RF,X_test_job,reult_orig = joblib.load("../dumps/random_forest.joblib")
File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/opt/freeware/lib64/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/opt/freeware/lib64/python3.7/pickle.py", line 1436, in load_reduce
stack[-1] = func(*args)
File "sklearn/tree/_tree.pyx", line 595, in sklearn.tree._tree.Tree.__cinit__
ValueError: Little-endian buffer not supported on big-endian compiler
Versions
System:
python: 3.7.10 (default, Jun 1 2021, 05:23:20) [GCC 8.3.0]
executable: /opt/freeware/bin/python3
machine: AIX-2-00C581D74C00-powerpc-64bit-COFF
Python dependencies:
pip: 20.1.1
setuptools: 47.1.0
sklearn: 0.24.2
numpy: 1.20.3
scipy: 1.6.3
Cython: None
pandas: None
matplotlib: None
joblib: 1.0.1
threadpoolctl: 2.1.0
Built with OpenMP: True
Metadata
Metadata
Assignees
Labels
No labels