loading my own datasets #3808

MartinLion · 2014-10-28T16:41:25Z

Hi all,

I am very new in scikit-learn.

My questions is: how to download my own dataset (csv file).

I will be highly appreciated any answers.

Thanks.
Martin

AlexanderFabisch · 2014-10-28T22:45:14Z

The documentation of sklearn is really very useful and should answer your question: http://scikit-learn.org (basically you have to put your data in numpy arrays)

jnothman · 2014-10-28T23:37:16Z

This is something that could have a bit more documentation than is in there
currently. You might find Pandas useful.

On 29 October 2014 09:45, Alexander Fabisch notifications@github.com
wrote:

The documentation of sklearn is really very useful and should answer
your question: http://scikit-learn.org (basically you have to put your
data in numpy arrays)

—
Reply to this email directly or view it on GitHub
#3808 (comment)
.

AlexanderFabisch · 2014-10-29T07:07:50Z

@jnothman Should we reopen this issue and add a new section in the documentation? For example in this section: http://scikit-learn.org/stable/tutorial/basic/tutorial.html ("Loading your own data").

jnothman · 2014-10-29T07:25:02Z

see #2801

On 29 October 2014 18:07, Alexander Fabisch notifications@github.com
wrote:

@jnothman https://github.com/jnothman Should we reopen this issue and
add a new section in the documentation? For example in this section:
http://scikit-learn.org/stable/tutorial/basic/tutorial.html ("Loading
your own data").

—
Reply to this email directly or view it on GitHub
#3808 (comment)
.

MartinLion · 2014-10-29T10:15:50Z

My own dataset means the dataset that I have collected by my self, not the
standard dataset that all machine learning have in their depositories (e.g.
iris or diabetes).

I have a simple csv file and I on my desktop and I want to load it inside
scikit-learn. That will allow me to use scikit-learn properly and introduce
it to my colleges to serve our community.

I need a very simple and easy way to do so.

I will be highly appreciated any useful advice.

On 29 October 2014 15:25, jnothman notifications@github.com wrote:

see #2801

On 29 October 2014 18:07, Alexander Fabisch notifications@github.com
wrote:

@jnothman https://github.com/jnothman Should we reopen this issue and
add a new section in the documentation? For example in this section:
http://scikit-learn.org/stable/tutorial/basic/tutorial.html ("Loading
your own data").

—
Reply to this email directly or view it on GitHub
<
https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60882069>

.

—
Reply to this email directly or view it on GitHub
#3808 (comment)
.

jnothman · 2014-10-29T11:12:22Z

See http://pandas.pydata.org/pandas-docs/stable/io.html

On 29 October 2014 21:15, MartinLion notifications@github.com wrote:

My own dataset means the dataset that I have collected by my self, not
the
standard dataset that all machine learning have in their depositories
(e.g.
iris or diabetes).

I have a simple csv file and I on my desktop and I want to load it inside
scikit-learn. That will allow me to use scikit-learn properly and
introduce
it to my colleges to serve our community.

I need a very simple and easy way to do so.

I will be highly appreciated any useful advice.

On 29 October 2014 15:25, jnothman notifications@github.com wrote:

see #2801

On 29 October 2014 18:07, Alexander Fabisch notifications@github.com
wrote:

@jnothman https://github.com/jnothman Should we reopen this issue
and
add a new section in the documentation? For example in this section:
http://scikit-learn.org/stable/tutorial/basic/tutorial.html ("Loading
your own data").

—
Reply to this email directly or view it on GitHub
<

https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60882069>

.

—
Reply to this email directly or view it on GitHub
<
https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60883212>

.

—
Reply to this email directly or view it on GitHub
#3808 (comment)
.

MartinLion · 2014-10-29T12:19:24Z

Thanks for the link. I checked it out, but the process looks complicated.
Perhaps if there is a short youtube video explains the process much easier,
otherwise I do not know what to do to solve this matter.

On 29 October 2014 19:12, jnothman notifications@github.com wrote:

See http://pandas.pydata.org/pandas-docs/stable/io.html

On 29 October 2014 21:15, MartinLion notifications@github.com wrote:

My own dataset means the dataset that I have collected by my self, not
the
standard dataset that all machine learning have in their depositories
(e.g.
iris or diabetes).

I have a simple csv file and I on my desktop and I want to load it
inside
scikit-learn. That will allow me to use scikit-learn properly and
introduce
it to my colleges to serve our community.

I need a very simple and easy way to do so.

I will be highly appreciated any useful advice.

On 29 October 2014 15:25, jnothman notifications@github.com wrote:

see #2801

On 29 October 2014 18:07, Alexander Fabisch notifications@github.com

wrote:

@jnothman https://github.com/jnothman Should we reopen this issue
and
add a new section in the documentation? For example in this section:
http://scikit-learn.org/stable/tutorial/basic/tutorial.html
("Loading
your own data").

—
Reply to this email directly or view it on GitHub
<

https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60882069>

.

—
Reply to this email directly or view it on GitHub
<

https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60883212>

.

—
Reply to this email directly or view it on GitHub
<
https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60899671>

.

—
Reply to this email directly or view it on GitHub
#3808 (comment)
.

jnothman · 2014-10-29T12:45:42Z

It probably looks something like:

import pandas as pd
data = pd.read_csv(open('myfile.csv'))
target = data[target_column_name]
del data[target_column_name]

Then fit a scikit-learn estimator

SVC().fit(data, target)

On 29 October 2014 23:19, MartinLion notifications@github.com wrote:

Thanks for the link. I checked it out, but the process looks complicated.
Perhaps if there is a short youtube video explains the process much
easier,
otherwise I do not know what to do to solve this matter.

On 29 October 2014 19:12, jnothman notifications@github.com wrote:

See http://pandas.pydata.org/pandas-docs/stable/io.html

On 29 October 2014 21:15, MartinLion notifications@github.com wrote:

My own dataset means the dataset that I have collected by my self, not
the
standard dataset that all machine learning have in their depositories
(e.g.
iris or diabetes).

I have a simple csv file and I on my desktop and I want to load it
inside
scikit-learn. That will allow me to use scikit-learn properly and
introduce
it to my colleges to serve our community.

I need a very simple and easy way to do so.

I will be highly appreciated any useful advice.

On 29 October 2014 15:25, jnothman notifications@github.com wrote:

see #2801

On 29 October 2014 18:07, Alexander Fabisch <
notifications@github.com>

wrote:

@jnothman https://github.com/jnothman Should we reopen this
issue
and
add a new section in the documentation? For example in this
section:
http://scikit-learn.org/stable/tutorial/basic/tutorial.html
("Loading
your own data").

—
Reply to this email directly or view it on GitHub
<

https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60882069>

.

—
Reply to this email directly or view it on GitHub
<

https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60883212>

.

—
Reply to this email directly or view it on GitHub
<

https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60899671>

.

—
Reply to this email directly or view it on GitHub
<
https://github.com/scikit-learn/scikit-learn/issues/3808#issuecomment-60906075>

.

—
Reply to this email directly or view it on GitHub
#3808 (comment)
.

MechCoder · 2014-10-29T13:40:34Z

You could also have a look at np.genfromtxt . Might be useful.

MartinLion · 2014-10-29T14:08:48Z

Hi jnothman,
Thank you so much for your help, I really appreciate your cooperation.

I tried applying your code. Thus, once I interned (import pandas as pd). Directly I had the following message in red color:

import pandas as pd
No module named 'dateutil'
Traceback (most recent call last):
File "<pyshell#0>", line 1, in
import pandas as pd
File "C:\Python34\lib\site-packages\pandas__init__.py", line 7, in
from . import hashtable, tslib, lib
File "pandas\tslib.pyx", line 37, in init pandas.tslib (pandas\tslib.c:76813)
ImportError: No module named 'dateutil'

What should I do?
Thanks a lot

MechCoder · 2014-10-29T14:15:04Z

It just means you do not have the dateutil module installed. You can install it by doing

sudo apt-get install python-dateutil

hth

MechCoder · 2014-10-29T14:17:06Z

You can have a look at this for more details, http://stackoverflow.com/questions/20853474/importerror-no-module-named-dateutil-parser

MartinLion · 2014-10-29T15:55:28Z

Thanks MechCoder for your contribution.

I tried "sudo apt-get install python-dateutil", but it is not clear to me at what stage should indicate this code?
Do you think that there is an easy way to load my (excel or csv) file suing any simple ways such as open folder (regular way). There is another matter also which how to determine the class label that I want to predict form my dataset using scikit-learn. But anyway this step supposed to be after loading the file itself. Not easy process at all.

Is there any youtube tutorial about loading dataset (not iris which is everywhere or other famous. stuff). Video is easy than links

afzal · 2015-08-22T00:33:11Z

HI all,
I wrote the following code:

import pandas as pd
data= pd.read_csv(open(home/maxinet/Desktop/1.csv))

and i got this error:
File "", line 2
data= pd.read_csv(open(home/maxinet/Desktop/1.csv))
^
SyntaxError: invalid syntax

could you plz guide me.

AlexanderFabisch · 2015-08-22T06:11:03Z

We could tell you what the problem is but I think in this case you will learn more from it if you find it on your own. You should read the error message carefully. It is a Python syntax error.

  File "", line 2
    data= pd.read_csv(open(home/maxinet/Desktop/1.csv))
                                                    ^
SyntaxError: invalid syntax

MartinLion · 2015-08-22T08:03:55Z

On 22 Aug 2015 08:33, "samira afzal" notifications@github.com wrote:

HI all,
I wrote the following code:

import pandas as pd
data= pd.read_csv(open(home/maxinet/Desktop/1.csv))

and i got this error:
File "", line 2
data= pd.read_csv(open(home/maxinet/Desktop/1.csv))
^
SyntaxError: invalid syntax

could you plz guide me.

I recommend you finding another tool where you can work with easily without
headache yourself with this particular one. This is what I did myself.

Good luck
Martin

—
Reply to this email directly or view it on GitHub.

jnothman · 2015-08-22T10:33:01Z

To be clear, these previous posters are saying that being somewhat
comfortable with the Python language is a prerequisite to using
scikit-learn. You have missed some quotes around a string. This shows great
unfamiliarity with Python (and a characteristic of most programming
languages), and scikit-learn is probably not the best place to start.

On 22 August 2015 at 18:04, MartinLion notifications@github.com wrote:

On 22 Aug 2015 08:33, "samira afzal" notifications@github.com wrote:

HI all,
I wrote the following code:

import pandas as pd
data= pd.read_csv(open(home/maxinet/Desktop/1.csv))

and i got this error:
File "", line 2
data= pd.read_csv(open(home/maxinet/Desktop/1.csv))
^
SyntaxError: invalid syntax

could you plz guide me.

I recommend you finding another tool where you can work with easily without
headache yourself with this particular one. This is what I did myself.

Good luck
Martin

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHub
#3808 (comment)
.

davidavdav · 2015-11-27T08:45:51Z

Just want to support @MartinLion --- I am a scikit-learn newbie and have just have spent a frustrating time going thought the docs, and I can't find anywhere how to read my own data (and not a prepared toy dataset), and what the python format of data is.

raghavrv · 2015-11-27T10:25:25Z

Kindly refer -

GaelVaroquaux · 2015-11-27T11:05:12Z

• How do I load my data to work with scikit-learn? • How to load data from CSV file?

We should add these in the FAQ.

raghavrv · 2015-11-27T14:43:55Z

should we instead add as a section in the tutorial below/above "Loading an example dataset"?

raghavrv · 2015-11-27T14:46:10Z

Also could you tag this "Question", "Documentation" and reopen it?

GaelVaroquaux · 2015-11-27T14:50:54Z

should we instead add as a section in the tutorial here?

We should reference it. But I don't see this as tutorial material because
it is outside the scope of scikit-learn. We can only give pointers

That's an answer that the users really don't want to hear, because there
point of view is that they have a lump of data and they want it inside
scikit-learn. The answer is: this is not a problem that scikit-learn
solves, go see pandas if you have CSV, scikit-image if you have images,
database connectors (SQLAlchemy?) if you work on databases...

I guess that we should have a sentence like this in the tutorial, where
you reference, with pointers.

As a side note, the kind of errors hit by the users on the thread of this
issue (lack of basic knowledge of Python for instance) tells me that we
cannot solve their problem. They need to go to entry-level tutorials on
Python, and get a bigger picture. Maybe we should make sure that we give
pointers to these in the right spots, eg early on in the tutorial.

MartinLion · 2015-11-27T18:37:41Z

should we instead add as a section in the tutorial here?

We should reference it. But I don't see this as tutorial material because
it is outside the scope of scikit-learn. We can only give pointers

That's an answer that the users really don't want to hear, because there
point of view is that they have a lump of data and they want it inside
scikit-learn. The answer is: this is not a problem that scikit-learn
solves, go see pandas if you have CSV, scikit-image if you have images,
database connectors (SQLAlchemy?) if you work on databases...

I guess that we should have a sentence like this in the tutorial, where
you reference, with pointers.

As a side note, the kind of errors hit by the users on the thread of this
issue (lack of basic knowledge of Python for instance) tells me that we
cannot solve their problem. They need to go to entry-level tutorials on
Python, and get a bigger picture. Maybe we should make sure that we give
pointers to these in the right spots, eg early on in the tutorial.

Well, take it easy!!!

I don't know whether you are one of scikit-learn staff or not, but I need
to say that your way of talking is harming both scikit-learn staff and
users (us), due to the two reasons:

First reason, criticizing people (like what you did) and assuming that they
are novice in Python so they don't know how to work with scikit-learn,
means you or the staff are trying to blind their eyes to the truth that
scikit-learn staff are not able to create a clear tutorial to allow loading
the real data, at least. In addition, pretending that the tutorial of
scikit-learn is perfect in spite all the questions regarding loading the
real data (not the toy data as it is too easy to be imported comparing to
the real data) is something needs to be reconsidered, and this means that
scikit-learn staff don't care about the name of scikit-learn at all.

Second, we can understand from your unsuitable way of talking that you
already forgot that scikit-learn is a product, and we as users are
customers, so either you or the staff of scikit-learn should respect all of
us and thank us for any comment or bug fixing. This is the professional way
of behavior. So I recommend you to think of your words before saying them. If
you are knowing the way of loading the real data and you'd like to help,
don't only say go see pandas, better you answer people's question nicely
rather than hurting them with your words, but if you're simply not able to
do that, so keep quite.

On the other hand, regarding the question "should we instead add as a
section in the tutorial here?", I would like to say "YES", you or
scikit-learn staff should add a section in the official tutorial about how
to load your own data either CSV, or ARFF or text or whatever, as users are
interested to load their own data, this is very critical issue should be
considered in the tutorial (not to be ignored). *If you rely on the user,
then what is your work? *

Nevertheless, for those who are still struggling with scikit-learn, I would
like to say, this is not the end of the world, and as I mentioned
previously, find another to tool make your life much easier. For this
reason, and in order to save your time, I would like to recommend some
tools to assist you in data mining procedures. For instance, Waikato
Environment for Knowledge Analysis (WEKA),
http://www.cs.waikato.ac.nz/ml/weka/, last version is WEKA 3-7-13, is a
collection of machine learning algorithms for data mining tasks. WEKA
allows you to use its schemes either from GUI or writing Java code, so its
very easy for non-programmers. Additional to WEKA, R is also an excellent
tool for data mining stuff, you can also perform tasks of R from WEKA or
vice versa. However, if you have a patience to design a prediction process
manually (drag/drop), RapidMiner is a great tool for this propose where you
can design a very nice flow to achieve your target.

Thanks David van Leeuwen for your support.

Good luck in your analysis.

Cheers,
Martin

—
Reply to this email directly or view it on GitHub
#3808 (comment)
.

raghavrv · 2015-11-27T20:48:25Z

Hey Martin,

Kindly don't be offended.

He did not criticize :) He, being one of the top contributors to scikit-learn has to make tough decisions as to what will go into our codebase and what will not, as a more verbose documentation or tutorial might not be preferable for a lot of people. Gael has in fact contributed a lot of user guides himself to scikit learn to help users.

The reason why he was opposing that addition to the tutorial was that there are multitude of ways in which users have their data stored and such a user guide on how to get the input data from all of them (a text file/csv file/database/zipped archive), is indeed out of scope for scikit learn, which is a machine learning library.

The most important thing to note here is that it is very clearly explained by documentations of libraries which handle data, like numpy or pandas.

It is expected from the user that he or she knows this! Since it seems to not be very clear, he suggests that we add a FAQ, pointing the user to such userguides, which are more elaborate than we could possibly get :)

It may appear that our tutorial could be a bit more elaborate on how the inputs are obtained. But the thing, in general, with userguides is that, it could always be a little bit more elaborate, which makes us set a hard limit on how detailed our userguides can get, to help contain the userguide in a maintainable format :) If you think from that perspective, you yourself would understand our situation.

As this issue is open someone will indeed send a PR soon adding a nice FAQ entry and an example, maybe, which could help clarify your (or any other new user's) doubts on input formats.

Cheers!

emmanuelle · 2015-11-27T21:09:03Z

Hello @MartinLion ,

we understand your eagerness to solve your problem, and your frustration when it is not solved.

However, you seem quite misinformed about what is scikit-learn, how it works, and how the project is developed. Therefore, I would like to make some points clear for you. As you can see from
http://scikit-learn.org/stable/about.html, scikit-learn is a community effort that is developed by a team of volunteers, mostly on their free time. Gaël is one of the creators of the project and its current leader:
scikit-learn would certainly not be the same without his contribution (the same for other volunteers), and he certainly did not deserve your dismissive words.

What I would like to emphasize is that there is no such thing as a scikit-learn "product", or scikit-learn "staff" (only a handful of people have worked full time on the project). You mention "we as users are
customers", but how much are you paying for using scikit-learn? Despite the important development cost, users get scikit-learn for free (and of course that's how it's intended to be). In fact, the development of the project relies on a fragile alchemy: users' needs being a top priority for developers, and users reporting bugs and concerns in the most positive way. The kind of "ranting" that you wrote can be very discouraging for developers, who contribute their free time and their expertise just because they believe that scikit-learn is a useful tool for the community. Some prominent developers stopped contributing to open-source software precisely because of such "customer-like attitude" of a few people underlining only shortcomings, and dismissing the huge development efforts. Please try to see the bright side as well: you received advice and comments from a lot of people, I'm sure that there was something for you to learn out of it, even if it did not solve your problem.

Also, although users' needs are indeed a top priority of scikit-learn (it has an amazing documentation, of which most scientific Python packages can be jealous!), each software addresses a well-targeted niche of users, and it is just normal that scikit-learn cannot fit all users. For example, it is preferable to use scikit-learn with already a good knowledge of Scientific Python. So, I'm really glad that you found a
package that suited your needs better, but please also acknowledge the time and good will that people gave away when answering you.

So, folks, let's all show some good will and keep a constructive dialog.
That's how the project we love will keep on rocking!

AlexanderFabisch · 2015-11-27T23:13:01Z

For this reason, and in order to save your time, I would like to recommend some tools to assist you in data mining procedures. For instance, Waikato Environment for Knowledge Analysis (WEKA), http://www.cs.waikato.ac.nz/ml/weka/, last version is WEKA 3-7-13, is a collection of machine learning algorithms for data mining tasks. WEKA allows you to use its schemes either from GUI or writing Java code, so its very easy for non-programmers. Additional to WEKA, R is also an excellent tool for data mining stuff, you can also perform tasks of R from WEKA or vice versa. However, if you have a patience to design a prediction process manually (drag/drop), RapidMiner is a great tool for this propose where you can design a very nice flow to achieve your target.

Maybe we should make clear that scikit-learn is a Python library. It does not have the same scope as WEKA or RapidMiner. It fits perfectly into the scientific Python ecosystem but you should be willing to write code if you want to use it.

davidavdav · 2015-11-28T09:32:38Z

Perhaps I should elaborate on my original frustration, to give you some context.

I've been programming in Python almost exclusively for a year now (I am a late convert), and am fairly familiar with the ecosystem---I've done lot's of webservice related things, but also manipulation of resources related to automatic speech recognition. I do my scientific work in Julia since a couple of years, and before that, in R, octave, c++/c (some 30 years in total). The Julia ecosystem is quite dynamic, and it is all very exciting, but Python just has this very large ecosystem and very clean coding, which makes it very attractive to use for little side experiments. This time I had to do some topic classification of (single sentence) text documents.

Now there is an abundant choice of language technology tools in Python, and I believe that via lda I got to scikit-learn. Great tutorials, lovely datasets and all, but I found it very difficult to find out how to organize my own data so that I could load this in. Just now, I browsed through the user guide again to find the docs for "load_files", but I could't find an entry. So a google search for "sklearn.datasets.load_files" got me there just now, and I happened to remember the particular module path from more painstaking searches yesterday (it is mentioned somewhere in a tutorial). For me, the essential information would have been: "Organize your data one document per file, one directory per class"---more or less what's under the documentation for load_files. This all makes perfect sense, but I come from a community where usual formats are "one datapoint per line", often with the class label on that line. But having said all this, I am pretty impressed how the Python (text) community has standardized data representation, from what I've seen so far. But perhaps because of the widely used standard data representation, this aspect has naturally less attention in documentation.

As a final note, whenever I try to teach students how to use some scientific tool set or another, I have to spend quite some time on "how to import your data". Nobody likes to do it, it can be a lot of effort for what you potentially use only once, and is therefore always a difficult threshold.

GaelVaroquaux · 2015-11-28T09:58:25Z

@davidavdav

I agree that loading data is a difficult and important thing. However, it is a domain-specific problem. You have a particular type of data. I have another. My data is medical images of brain activity. I can tell you how I organize my data and load them. I can even tell you that we have written a whole package about this, with its own documentation. But that will probably not help you.

What you want is something that tells you how to organize and load your data. Now, it may be that your data is something fairly classic, that many people have; for instance tabular data most often stored in CSV files. In which case there is a need for a package doing this data loading. I don't believe that it should be in scikit-learn. It needs to be in a package that is specialized for this data. For instance, we are not going to put image processing in scikit-learn. For tabular data, the dedicated package is pandas; as I mentioned in my reply we need to point to it. We, the scikit-learn team, want to make plugin pandas into scikit-learn easier. But it is not as easy as it may seem and it takes time (one of our core devs is prototyping something).

I realize rereading your post that your data is most likely text documents. So my two examples of data (medical images and tabular data) were both wrong :$. Maybe the documentation on processing text with scikit-learn could indeed be improved and touch a bit on data organization. I don't know, I very seldom process text. But if you want to do add a few words on this, you are most welcomed to do a pull request. Anyhow, this illustrate my point about the diversity of the data: this whole thread is mostly about loading CSV files, as can be seen from earlier comments (before the thread exploded into a rant). The important thing is not the "CSV", which is the container, but the data model that underlies a CSV file. This data model is that of columns of data with different nature. It's a very different data model than processing text documents.

And finally, you are unhappy that teaching people "how to import your data" is time consuming. I don't think that there is an easy fix for this, even in a specific domain. The reason being that data meaning (ie data semantics) is still very much an open area. It's intrinsically hard to describe what the data means and how it's organized. You can try a simple experience: grab a dataset from someone you don't know, about an experiment you don't know, and try understanding it. Not even loading it, just understanding it. I am sure that it will take time. What takes a human time tends to be very difficult for a computer.

amueller · 2016-09-12T22:52:26Z

Hm I don't think we added pointers to the FAQ yet. It's certainly a FAQ.

chenhe95 · 2016-09-27T21:11:30Z

I wrote a short tutorial on how to get the dataset from a text format to a pandas DataFrame for use by sklearn here http://cis399-he.tumblr.com/post/151024047044/load-your-data-to-scikit-learn

amueller · 2016-10-13T19:05:06Z

fixed in #7516

KamodaP · 2016-10-31T17:15:27Z

Well you could add some information about common errors that happen when loading own data. For example I have bumped on Unknown label type: 'continuous-multioutput' and not a hint, anywhere, what that could mean. I've tried passing:

a numpy array created from csv reader
a numpy array created from pandas
regular array parsed line by line
pandas dataframe
a numpy array created from csv reader converted to regular array
etc...

I'd recommend adding section summarizing assumptions about data, for example it was sheer luck that I found information that data sets may not contain NaN's Nulls etc. Also, what should be the parameters of a numpy array and how should pandas dataframe look like.

jnothman · 2016-11-01T03:56:52Z

Perhaps that error message could be clearer, but I think passing regression targets to a classifier (as in your #7801) is a usage error nothing to do with loading your own dataset.

tursunwali · 2016-11-20T04:47:16Z

@MartinLion @MartinLion
Thank you to brought up this issue.
I am facing your problem now. I want to upload my IRIS like dataset. Which is 500 rows and 16 columns.
and
Thank you @chenhe95 ,
I have seen your tutorial titled "Can I go to the bathroom" , that is a real work , real help to users like us.

And Thank you all developers,
You worked free (or almost free) to deliver this library to people. I appreciate that.

@MartinLion has spirit of learning, I praise that.
Actually stubbornness is a virtue in academic.

MartinLion · 2016-11-20T05:15:00Z

@tursunwali,

Thanks for the kind words.

Please let me know if you still have any issues with your data or learning
process.

Regards,
Martin

On 20 November 2016 at 12:47, tursunwali notifications@github.com wrote:

@MartinLion https://github.com/MartinLion @MartinLion
https://github.com/MartinLion

Thank you to brought up this issue.
I am facing your problem now. I want to upload my IRIS like dataset. Which
is 500 rows and 16 columns.
and
Thank you @chenhe95 https://github.com/chenhe95 ,
I have seen your tutorial titled "Can I go to the bathroom" , that is a
real work , real help to users like us.

And Thank you all developers,
You worked free (or almost free) to deliver this library to people. I
appreciate that.

@MartinLion https://github.com/MartinLion has spirit of learning, I
praise that.
Actually stubbornness is a virtue in academic.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3808 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AI_3_7E0UU8Os42HXUC5n2eLVYW6hqe6ks5q_9DngaJpZM4Cz90f
.

AlexanderFabisch closed this as completed Oct 28, 2014

raghavrv mentioned this issue Dec 26, 2014

Addition of a new Question / Query label #4005

Closed

GaelVaroquaux reopened this Nov 27, 2015

GaelVaroquaux added Documentation Question labels Nov 27, 2015

amueller added this to the 0.19 milestone Sep 12, 2016

amueller added Easy Well-defined and straightforward way to resolve Need Contributor labels Sep 12, 2016

chenhe95 mentioned this issue Sep 28, 2016

[MRG + 1] Added some documentation for loading external datasets (Issue 3808) #7516

Merged

amueller closed this as completed Oct 13, 2016

scikit-learn deleted a comment from mimilalo99 Feb 9, 2019

scikit-learn locked as too heated and limited conversation to collaborators Feb 9, 2019

loading my own datasets #3808

loading my own datasets #3808

Comments

MartinLion commented Oct 28, 2014

AlexanderFabisch commented Oct 28, 2014

jnothman commented Oct 28, 2014

AlexanderFabisch commented Oct 29, 2014

jnothman commented Oct 29, 2014

MartinLion commented Oct 29, 2014

jnothman commented Oct 29, 2014

MartinLion commented Oct 29, 2014

jnothman commented Oct 29, 2014

Then fit a scikit-learn estimator

MechCoder commented Oct 29, 2014

MartinLion commented Oct 29, 2014

MechCoder commented Oct 29, 2014

MechCoder commented Oct 29, 2014

MartinLion commented Oct 29, 2014

afzal commented Aug 22, 2015

AlexanderFabisch commented Aug 22, 2015

MartinLion commented Aug 22, 2015

jnothman commented Aug 22, 2015

davidavdav commented Nov 27, 2015

raghavrv commented Nov 27, 2015

GaelVaroquaux commented Nov 27, 2015 via email

raghavrv commented Nov 27, 2015

raghavrv commented Nov 27, 2015

GaelVaroquaux commented Nov 27, 2015

MartinLion commented Nov 27, 2015

raghavrv commented Nov 27, 2015

emmanuelle commented Nov 27, 2015

AlexanderFabisch commented Nov 27, 2015

davidavdav commented Nov 28, 2015

GaelVaroquaux commented Nov 28, 2015

amueller commented Sep 12, 2016

chenhe95 commented Sep 27, 2016

amueller commented Oct 13, 2016

KamodaP commented Oct 31, 2016

jnothman commented Nov 1, 2016

tursunwali commented Nov 20, 2016

MartinLion commented Nov 20, 2016