Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`.

#### Description
Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`.


#### Steps/Code to Reproduce
```python
    from sklearn.linear_model import LogisticRegression
    import sklearn.metrics
    import numpy as np

    # Set up a logistic regression object
    lr = LogisticRegression(C=1000000, multi_class='multinomial',
                            solver='sag', tol=0.0001, warm_start=False,
                            verbose=0)

    # Set independent variable values
    Z = np.array([
       [ 0.        ,  0.        ],
       [ 1.33448632,  0.        ],
       [ 1.48790105, -0.33289528],
       [-0.47953866, -0.61499779],
       [ 1.55548163,  1.14414766],
       [-0.31476657, -1.29024053],
       [-1.40220786, -0.26316645],
       [ 2.227822  , -0.75403668],
       [-0.78170885, -1.66963585],
       [ 2.24057471, -0.74555021],
       [-1.74809665,  2.25340192],
       [-1.74958841,  2.2566389 ],
       [ 2.25984734, -1.75106702],
       [ 0.50598996, -0.77338402],
       [ 1.21968303,  0.57530831],
       [ 1.65370219, -0.36647173],
       [ 0.66569897,  1.77740068],
       [-0.37088553, -0.92379819],
       [-1.17757946, -0.25393047],
       [-1.624227  ,  0.71525192]])
    
    # Set dependant variable values
    Y = np.array([1, 0, 0, 1, 0, 0, 0, 0, 
                  0, 0, 1, 1, 1, 0, 0, 1, 
                  0, 0, 1, 1], dtype=np.int32)

    lr.fit(Z, Y)
    p = lr.predict_proba(Z)
    print(sklearn.metrics.log_loss(Y, p)) # ...

    print(lr.intercept_)
    print(lr.coef_)
```


#### Expected Results
If we compare against R or using `multi_class='ovr'`, the log loss (which is approximately proportional to the objective function as the regularisation is set to be negligible through the choice of `C`) is incorrect. We expect the log loss to be roughly `0.5922995`

#### Actual Results
The actual log loss when using `multi_class='multinomial'` is `0.61505641264`.

#### Further Information
See the stack exchange question https://stats.stackexchange.com/questions/306886/confusing-behaviour-of-scikit-learn-logistic-regression-multinomial-optimisation?noredirect=1#comment583412_306886 for more information.

The issue it seems is caused in https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/linear_model/logistic.py#L762. In the `multinomial` case even if `classes.size==2` we cannot reduce to a 1D case by throwing away one of the vectors of coefficients (as we can in normal binary logistic regression). This is essentially a difference between softmax (redundancy allowed) and logistic regression.

This can be fixed by commenting out the lines 762 and 763. I am apprehensive however that this may cause some other unknown issues which is why I am positing as a bug.

#### Versions

Linux-4.10.0-33-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`. #9889

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Further Information

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'. #9889

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Further Information

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`. #9889