Skip to content

Method best_solution() does not always return correct solution_idx #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
borisarloff opened this issue Nov 14, 2022 · 6 comments
Open
Labels
bug Something isn't working

Comments

@borisarloff
Copy link

In the event where multiple best solutions happen to have the same best fitness value, then best_solution() returns the first generation it finds with that same high fitness value. That index however does not necessarily correspond to the returned best solution.

A work-around could be to retrieve the index of the best solution, rather than relying on best_solution() for that index. However, this could also fail when in the rare case of more than one generation with a same best solution. It would be unclear as to which index is being returned. On the other hand, such would not matter when the fitness is the same best fitness with a deterministic and not a stochastic fitness function (FF).

To reproduce, create a FF which repeatedly generates same few fitness values. Call best_solution() and compare with:
print(f"Generation: {ga_instance.generations_completed} Best solution fitness: {ga_inst.best_solution()[1]}") output from on_generation callback function.

@Simonrazer
Copy link

I have a example where best_solution() as far as I can tell return a random index, but most of the time 0. Furthermore, saving the best index by hand in ff results in wrong results as well. I am stumped. I am pretty new to machine learning, so please forgive me if my code is ridiculous in any way, and tell me where I went wrong. Any help is greatly appreciated.
Here is main.py, the important stuff:

import numpy
import pygad
import pygad.nn
import pygad.gann
import ele

def fitness_func(solution, sol_idx):
    global GANN_instance, lastbest, lastbesti
    #Run give NN data and ask it what to do next
    envos = [ele.simulation(sol_idx)]
    data_inputs = numpy.empty(shape=[1,86], dtype=int)

    #structure of inputs: current floor, target floor of elevator person[]*max in elevator, target floor of waiting person[floor[max people per floor]]
    data_inputs[0] = envos[0].run([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]) #Run the first iteration to get a data input
    envos[0].score = 0

    #Simulate 20 iterations
    #indices:                  0   1     2     3 4 5 6 7                                8 9 10 11 12 13 14 15 16 17   
    #strucutre of predictions: up, down, stop, unload passenger[]*max pass in elevator, load passenger[]*max passenger on floor
    for i in range(20):
        predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[sol_idx], #What will the network do in this situation?
                                    data_inputs=data_inputs, problem_type="regression")
        data_inputs[0] = envos[0].run(predictions[0]) #Simulate according to the instructions of the network and save that as the next input

    #save best current network index manually
    if lastbest <= envos[0].score:
        lastbest = envos[0].score
        lastbesti = sol_idx
    return envos[0].score

def callback_generation(ga_instance):
    global GANN_instance

    population_matrices = pygad.gann.population_as_matrices(population_networks=GANN_instance.population_networks, 
                                                            population_vectors=ga_instance.population)

    GANN_instance.update_population_trained_weights(population_trained_weights=population_matrices)

    print("Generation = {generation}".format(generation=ga_instance.generations_completed))
    print("Accuracy         = {fitness}".format(fitness=ga_instance.best_solution()[1]))

lastbest = 0
lastbesti = 0

GANN_instance = pygad.gann.GANN(num_solutions=50,
                                num_neurons_input=86,    #Number of Datapoints to respect
                                num_neurons_hidden_layers=[2],
                                num_neurons_output=18,   #Number of possible Actions
                                hidden_activations=["relu"],
                                output_activation="sigmoid")

population_vectors = pygad.gann.population_as_vectors(population_networks=GANN_instance.population_networks)

ga_instance = pygad.GA(num_generations=20, 
                       num_parents_mating=3, 
                       initial_population=population_vectors.copy(),
                       fitness_func=fitness_func,
                       mutation_percent_genes=35,
                       callback_generation=callback_generation)

ga_instance.run()
ga_instance.plot_fitness()

solution, solution_fitness, solution_idx = ga_instance.best_solution()
#print(solution)
print(solution_fitness) #result: about 16
print("Solution index acc. to Library:", solution_idx, "Solution acc. to comparison:", lastbesti)

print("Score of solution_idx:", fitness_func(0,solution_idx)) #result of both: all over the place, tho always very bad (<0)
print("Score of lastbesti:", fitness_func(0,lastbesti))

main.zip

@whubaichuan
Copy link

@borisarloff hi, you talked about the solution_idx, in my test, the solution and solution fitness could also be wrong.

For example, with stochastic fitness function, we have a such setting:

    num_generations = 3
    num_parents_mating = 2
    fitness_function = fitness_func
    sol_per_pop = 4
    num_genes = 4
    parent_selection_type = "tournament"
    keep_elitism = 1

generation 0: [0.5,0.5,0.5,0.4]
generation 1: [0.5,0.5,0.6,0.5]
generation 2: [0.8,0.6,0.5,0.5]
generation 3: [0.8,0.6,0.6,0.5]

fitness of the best_solution() be [0.6] rather than [0.8], I am confused.

@ahmedfgad
Copy link
Owner

@borisarloff hi, you talked about the solution_idx, in my test, the solution and solution fitness could also be wrong.

For example, with stochastic fitness function, we have a such setting:

    num_generations = 3
    num_parents_mating = 2
    fitness_function = fitness_func
    sol_per_pop = 4
    num_genes = 4
    parent_selection_type = "tournament"
    keep_elitism = 1

generation 0: [0.5,0.5,0.5,0.4] generation 1: [0.5,0.5,0.6,0.5] generation 2: [0.8,0.6,0.5,0.5] generation 3: [0.8,0.6,0.6,0.5]

fitness of the best_solution() be [0.6] rather than [0.8], I am confused.

@whubaichuan,

Could you share a code sample please? Just to replicate the issue on my end.

@whubaichuan
Copy link

whubaichuan commented Feb 22, 2023

@ahmedfgad hi, I guess due to the same problem you have answered here. I think the multiple times of calculations for the same solution will influence my best_fitness across the generation because I am using the stochastic fitness function. By the way, when can I use the correct version? Thanks a lot.

@ahmedfgad
Copy link
Owner

@whubaichuan,

This would be the reason why you have this behavior. Please give a try when the new release is published.

@whubaichuan
Copy link

whubaichuan commented Feb 22, 2023

@ahmedfgad

problem solved in Pygad new release (2.19.2)

yes, here is the output of best_solutions_fitness (there are 10 generations and the keep_elitism is larger than 0)

[0.7014856849397931, 0.7132997342518398, 0.7557456777209327, 0.7557456777209327, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089]

Here, 0.7557456777209327 is not kept to the end. Same problem here

@ahmedfgad ahmedfgad added the bug Something isn't working label Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants