Academia.eduAcademia.edu

Learning to Solve Constraint Problems

2007

This paper explains why learning to solve constraint problems is so difficult, and describes a set of methods that has been effective on a broad variety of problem classes. The primary focus is on learning an effective search algorithm as a weighted subset of ordering heuristics. Experiments show the impact of several novel techniques on a variety of problems. When a planning problem is cast as a constraint satisfaction problem (CSP), it can use the representational expressiveness and inference power inherent in constraint programming (Nareyek et al. 2005). If such an encoding lacks the necessary planning knowledge, a program might learn effective solution methods. Our thesis is that it is possible to learn to solve constraint satisfaction problems from experience. In this scenario, a program is given a set of CSPs and a set of search heuristics. It is then expected to learn an effective search algorithm, represented as a weighted combination of some subset of those heuristics. From...

Learning to Solve Constraint Problems Susan L. Epstein1,2 and Smiljana Petrovic1 1 Department of Computer Science, The Graduate Center of The City University of New York, NY, USA 2 Department of Computer Science, Hunter College of The City University of New York, NY, USA spetrovic@gc.cuny.edu, susan.epstein@hunter.cuny.edu Abstract This paper explains why learning to solve constraint problems is so difficult, and describes a set of methods that has been effective on a broad variety of problem classes. The primary focus is on learning an effective search algorithm as a weighted subset of ordering heuristics. Experiments show the impact of several novel techniques on a variety of problems. When a planning problem is cast as a constraint satisfaction problem (CSP), it can use the representational expressiveness and inference power inherent in constraint programming (Nareyek et al. 2005). If such an encoding lacks the necessary planning knowledge, a program might learn effective solution methods. Our thesis is that it is possible to learn to solve constraint satisfaction problems from experience. In this scenario, a program is given a set of CSPs and a set of search heuristics. It is then expected to learn an effective search algorithm, represented as a weighted combination of some subset of those heuristics. From our perspective, the scenario’s principal challenge is a plethora of purportedly “good” heuristics: heuristics to select variables or values, heuristics for inference, heuristics to determine when to restart. The focus here is on heuristics for traditional global search. After fundamental definitions and related work, this paper addresses the differences among search order heuristics, the power of mixtures of heuristics, and fundamental issues in learning to solve a class of CSPs. It then describes two algorithms for learning such mixtures, and additional learning methods that speed learning and often improve search performance. Background and related work A CSP is a set of variables, each with a domain of values, and a set of constraints, expressed as relations over subsets of those variables. CSP papers often present results on a class of CSPs, that is, a set of putatively similar problems. For example, a class of model B problems is characterized by <n, m, d, t>, where n is the number of variables, m the maximum domain size, d the density (fraction of edges out of n(n-1)/2 possible edges) and t the tightness (fraction of possible value pairs that each constraint excludes) (Gomes et al. 2004). A problem class can also mandate some non-random structure on its problems. For example, a composed problem consists of a subgraph called its central component loosely joined to one or more subgraphs called satellites (Aardal et al. 2003). In a binary CSP, all constraints are on at most two variables. A binary CSP can be represented as a constraint graph, where vertices correspond to the variables (labeled by their domains), and each edge represents a constraint between its respective variables. Although the work reported here is on binary CSPs, in principle that is not a restriction. A solution to a CSP is an instantiation of all its variables that satisfies all the constraints. Here, search for a solution iteratively selects a variable and assigns it a value from its domain, producing a search node. After each assignment, some form of inference detects values that are incompatible with the current instantiation. We use the MAC-3 inference algorithm to maintain arc consistency during search (Sabin et al. 1997). MAC-3 temporarily removes currently unsupportable values to calculate dynamic domains that reflect the current instantiation. If every value in any variable’s domain is inconsistent (violates some constraint), then the current instantiation cannot be extended to a solution and some retraction method is applied. Retraction here is chronological backtracking: it prunes the subtree (digression) rooted at the inconsistent node and withdraws the most recent value assignment(s). All data was generated with ACE (the Adaptive Constraint Engine). ACE learns a customized combination of pre-specified search heuristics for a class of CSPs (Epstein et al. 2005a) . It attempts to solve some sequence of these problems within a specified resource limit (the learning phase). Then learning is turned off and the program attempts to solve a new sequence of problems drawn from the same set (the testing phase). A run is a learning phase followed by a testing phase. The resource limit is measured in steps (number of variable selections and value selections). The ability of a program to learn to solve CSPs is gauged here by the number of problems solved, the number of search steps, and the search tree size in nodes, averaged over a set of runs. The premise of learning is that data can be calculated, stored, and applied to improve performance. Thus, it is reasonable to learn about how to solve a class of problems only if the effort expended, both to learn and to apply learned knowledge, can be justified by a frequent need to solve similar problems. For easy problems, learning is a waste of resources —the search algorithm should recognize and apply a simple, effective approach from its arsenal. On a class of more challenging problems, learning may be worthwhile if one expects to solve such problems often. Indeed, proponents of any new search algorithm inherently argue that most CSPs are enough like one another so that success on some set of classes, as described in their papers, bodes well for other classes yet untested. On a class of hard problems, learning is appealing because the given search algorithm is slower than one would wish, and thus the class is “hard” for the search algorithm at hand. Historically most learning for constraint solving has been on an individual problem, rather than on an entire class. Such learning has primarily focused either on inconsistent partial instantiations that should be avoided or on constraints that provoke retraction (Dechter et al. 1987; Dechter 2003; Boussemart et al. 2004). Other work has learned weights for individual assignments (Refalo 2004), alternated among methods while solving an individual problem (Borrett et al. 1996), identified problematic edges with a preliminary local search (Ruml 2001; Eisenberg et al. 2003; Hoos et al. 2004), learned global constraints (Bessière et al. 2001; Bessière 2007), or addressed optimization problems and incomplete methods (Caseau et al. 1999; Caseau et al. 2004; Carchrae et al. 2005). Table 2: Performance of 3 popular heuristics (in italics) and their duals on 50 Comp problems (described in the text) under a 100,000-step limit. Observe how much better the duals perform on problems from this class. Unsolved Heuristic problems Steps Max degree 9 19901.76 Min degree 0 64.60 Max forward-degree 4 10590.64 Min forward-degree 0 64.50 Min domain/degree 7 15558.28 Max domain/degree 4 10922.82 Despite enthusiasm for them in the CSP literature, ordering heuristics (those that select variables and values for them) display surprisingly uneven performance. Consider, for example, the performance of the variable selection heuristics in Table 1. (Definitions for all heuristics appear in the Appendix.) Even well-trusted individual heuristics such as these vary dramatically in their performance. For example, max-weighted-degree (Boussemart et al. 2004) is among the best individual heuristics when the number of variables is substantially larger than the maximum domain size (e.g., 50-10). It appears to be less effective, however, when there are more potential values than variables (e.g., 20-30). Perhaps more surprising is that the opposite of a popular heuristic may be considerably more effective than the original. Let a metric be a function from a set of choices (variables or values) to the real numbers. A metric returns a score for each choice. An ordering heuristic is thus a preference for one extreme or the other of the scores returned by its metric. A dual for a heuristic reverses the import of its metric (e.g., max-domain is the dual of mindomain). Duals of popular heuristics may outperform them on real-world problems and on problems with non-random structure (Petrie et al. 2003; Lecoutre et al. 2004; Otten et al. 2006). For example, each composed problems in Comp has a Model B central component from <22, 6, 0.6, 0.1> linked to a single model B satellite from <8, 6, 0.72, 0.45> by edges with density 0.115 and tightness 0.05. The central component is substantially larger, with lower tightness and lower density than its satellite. These CSPs are particularly difficult for some traditional heuristics. For example, maxd e g r e e tends to select variables from the central component, while the decidedly untraditional min-degree tends to prefer variables from the satellite and thereby detects inconsistencies much earlier. Table 2 shows how three traditional heuristics and their duals fare on Comp. Surprisingly, the simplest duals do by far the best. This is of particular concern because the structural features of Comp often appear in real-world problems. In practice, a good mixture of heuristics can outperform even the best individual one, as Table 3 demonstrates. The first line shows the best performance achieved by any traditional single heuristic from Table 1. The second line of Table 3 shows that a good pair of heuristics, one for variable ordering and the other for value ordering, can perform significantly better than an individual heuristic. Nonetheless, the identification of such a pair is not trivial. For example, m a x - p r o d u c t - d o m a i n - v a l u e better Table 1: Search tree size under individual heuristics on 50 problems from each of three randomly-generated Model B classes: <50, 10, 0.38, 0.2>, <20, 30, 0.444, 0.5>, and <30, 8, 0.26, 0.34> (referred to hereon as 50-10, 20-30, and 30-8, respectively). Table 3: Search tree size under individual heuristics and under mixtures of heuristics on three classes of problems. ACE learns a different, high-performing mixture of more than two heuristics for each of these classes. The argument for multiple heuristics Heuristic 20-30 50-10 3,942 30,025 2,764 15,091 30,956 Mixture 30-8 The best heuristic from 205 Table 1 Min dom/dynamic degree + 156 Max Product Domain Value Max-weighted-degree + 179 Max Product Domain Value 3,892 22,273 3,942 30,791 Mixture found by ACE 2,502 12,120 4,090 30,025 30-8 20-30 50-10 min-domain max-degree 563 206 10,411 5,267 51,347 46,347 max-forward-degree 220 10,150 43,890 min-domain/degree 234 4,194 35,175 max-weighted-degree 223 5,897 min-dom/dynamic-deg 211 min-dom/weighted-deg 205 141 complements min-domain/dynamic-degree than it does max-weighted-degree. The last line demonstrates that combinations of more than two heuristics can further improve performance. Given these results, a program required to learn effective search without knowledge about problem structure should be provided with many popular heuristics, along with their duals. ACE’s heuristics, each with its own metric, are gleaned from the CSP literature. To make a decision during search, ACE uses a weighted mixture of expressions of preference from a large number of such heuristics. This is a difficult task. Why learning on a class of problems is hard Without an instructor to provide examples of good and bad decisions, learning in our scenario is self-supervised, that is, the learner must assess both the quality of its own actions and the adequacy of its model of the environment. The <search node, decision> pairs from a solver’s trace provide self-generated training instances. Reinforcement learning rewards or penalizes heuristics based on their ability to provide good search advice (Sutton & Barto, 1998), but in this context it faces a variety of difficulties. A solution path may not provide good training instances. Since every variable must be assigned a value, any variable ordering must eventually lead to a solution if the problem is solvable. Nonetheless, some variable orders generate substantially fewer nodes, and may be more effective by several orders of magnitude; those are the ones we want our learner to produce. Self-generated training instances, however, may not necessarily represent good variable choices. Moreover, any variable ordering can lead to an error-free solution if each chosen value satisfies all constraints. As a result, the ease with which a solution is found is not a reliable criterion for evaluating the quality of the decisions that led to a solution. The difficulty of a problem is hard to assess. Training instances must be drawn from the same population as testing instances, but a class of CSPs is only putatively similar. For a given search algorithm, in some circumstances the distribution of difficulty within a class is heavy tailed (Hulubei et al. 2005). Thus some problems will be extremely difficult, while others will be manageable, or even easy. When a learner confronts a CSP from a class, it is hard to predict how amenable the particular problem will be to the search algorithm. This issue arises whether or not the problems are “hard” in some fundamental sense. Variation in difficulty is not noise; it is inherent in the problems themselves and in their interaction with heuristics. In learning to solve CSPs, the skewed distribution within a problem class (as the result, perhaps, of an inappropriate heuristic) poses a particular challenge that is only exacerbated by more difficult classes. The difficulty of a problem class is hard to assess. In Model B problems, for fixed values of n and m, there are value combinations for d and t that make the entire class of problems difficult in some fundamental sense (the phase transition) (Cheeseman et al. 1991). Even in a class at the phase transition (as are the classes in Table 3) there may be a wide range of difficulty, so that individual problems could give a misleading picture of the class as a whole. In theory one could assess the difficulty of a class using standard algorithms on a sample drawn from it, and thereby characterize the relative difficulty for problems with different parameter values. More generally, however, particularly in real-world contexts, this may not be possible to do beforehand. In such situations, the previously described difficulties of learning from learner-generated solution paths may be magnified. The severity of an error is costly to assess. An error is a value assignment that is eventually retracted during search. Typically, even a handcrafted CSP solver arrives at a solution only after a lengthy series of errors. To penalize incorrect decisions appropriately, one should assess the severity of the error. Effectively, any incorrect decision creates an unsolvable problem. When a good solver errs, it will quickly discover its error. Gauging the effectiveness of error recovery, however, requires exploration of every possible ordering of value assignments in the digression, an unreasonable computational burden. Errors may not be immediately apparent. An important issue in credit/blame assignment for reinforcement learning is that most retractions appear at some distance from the root of the search tree. In fact, even for hard but solvable problems, there are usually relatively few retractions at the top of the search tree, even with maintained arc consistency. Retractions often begin only after several decisions have been made. In such searches, the impact of bad decisions, especially variable selections, appears only after several more decisions have been made. As a result, it is difficult to assign blame to the true culprits. Implications for learning. In summary, given a set of search traces, it is difficult to gauge how representative they are of effective search, difficult to identify sources of inefficiency from errors alone, and difficult to gauge how severe the errors are, how hard an individual problem is (despite its class designation), and even the degree to which a solution is based on good decisions. Moreover, since CSP solution is NP-complete, there can be no “gold standard” by which to judge the quality of a heuristic; the perfect search path must be assumed to be unobtainable on a regular basis. Clearly, for a program expected to learn an effective search algorithm based only on its own problemsolving experience, the interpretation of success and failure is not straightforward. Even the worst heuristic can solve some problems quickly. If such problems occur early in learning, then an ineffective heuristic will deceptively appear to be effective. If poor heuristics are reinforced early in learning, they will inevitably lead to poor performance on some subsequent problems. Learning a mixture of heuristics ACE, is based on FORR, an architecture for the development of expertise from multiple heuristics (Epstein 1994). ACE learns a customized weighted mixture of prespecified heuristics for any given class. Guided by its ordering heuristics (here, Advisors) ACE solves problems in a given class and uses that experience to learn a weight profile (a set of weights for the Advisors). To select a variable or a value, ACE consults its Advisors. Each Advisor Ai expresses the strength sij of its preference for choice cj . Based on the weight profile, the choice with the highest weighted sum of Advisors strengths to choices is selected: argmax ∑ w i sij j [1] i Initially, all weights are set to 0.05. During learning, ACE gleans training instances from its own (likely imperfect) successful searches. Positive training instances come from € the error-free path to a solution. Negative training instances are incorrect value selections, as well as variable selections after which a value assignment fails. Decisions made within a digression are not considered. After each successful search, ACE extracts training instances from the trace, and updates the weight profile with a weight-learning algorithm before it goes on to the next problem. Weights are based on the historical frequency with which an Advisor agreed with positive training instances and disagreed with negative ones, taken as a ratio because an Advisor may not always comment (express a preference) on a training instance. The learning algorithm presents each training instance, along with the possible actions available, to each Advisor. If an Advisor can discriminate among these actions by its comment strengths, its weight is adjusted: increased if it supports the correct decision, decreased otherwise. The actual weights are more than mere frequencies, however. A reward (the increment to the numerator) is not necessarily 1, and a penalty (the decrement to the numerator) can be substantial. ACE has two effective algorithms that learn a weighted mixture of ordering heuristics for a class of CSPs. In DWL (Digression-based Weight Learning) an Advisor supports a Table 4: ACE’s average steps to solution, with and without full restart. A lower step limit without full restart gives uneven performance. Class <30, 8, 0.31, 0.34> <30, 8, 0.18, 0.5> Restart strategy Learning step limit Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 decision if it gives that choice its highest rank. DWL rewards and penalizes search choices based on the number of nodes in the search tree, the size of its digressions, and performance on the preceding problems. RSWL (Relative Support Weight Learning) considers all heuristics’ preferences when a decision is made. Weight reinforcements under RSWL depend upon the normalized difference between the strength the Advisor assigned to that decision and the average strength it assigned to all available choices (relative support). An Advisor supports a decision if its relative support is positive. Under RSWL, rewards and penalties are proportional to relative support. None None 4 out of 7 failed None None 4 out of 7 failed 20000 2000 500 10000 1000 500 145.13 149.17 163.28 146.85 153.25 144.30 154.90 150.27 135.93 150.77 145.12 150.10 6541.17 151.63 6373.50 144.02 157.73 154.55 157.68 154.25 144.47 147.85 163.98 152.73 156.27 154.63 158.10 153.25 162.58 158.00 71.80 3324.42 73.00 71.07 71.38 72.37 69.72 69.72 71.95 70.85 70.43 73.23 71.53 71.92 71.97 71.43 72.43 75.82 72.37 71.42 71.50 69.75 73.87 72.43 71.25 3370.53 72.78 69.90 71.62 73.20 Important learning mechanisms The multitude of available CSP heuristics, their idiosyncratic applicability, and the issues described earlier drove the development of several important general learning mechanisms which we summarize here. Although these approaches are not limited to CSP, they are essential for good learning performance as envisioned here. All of the following experiments with ACE average results over 10 runs and use 42 Advisors, (described in the Appendix: 28 for variable ordering and 14 for value ordering). Full restart. When class-inappropriate heuristics acquire high weights early in training, they often control the subsequent decisions and repeatedly fail to solve problems. Full restart recognizes that the current learning attempt is not promising, abandons the responsible training problems, and restarts the entire learning process with a freshlyinitialized weight profile (Petrovic et al. 2006). Without full restart, reduced learning resources (the learning step limit) produce occasional unsatisfactory runs. With an appropriate full restart strategy, however, learning resources can be reduced by an order of magnitude without compromising performance. Full restart has proved most effective when it responds to the frequency of recent problem failure and when learning terminates after some number of consecutive solved problems. Table 4 demonstrates the power of restart. ACE monitored its own reliability during the learning phase: failure on 4 of the last 7 problems triggered a full restart. During the learning phase, ACE was required to try to solve 30 problems in its current full-restart attempt. Problems were never reused during learning, even under full restart. Random subsets. Given an initial set of heuristics that is large and inconsistent, many class-inappropriate heuristics may combine to make bad choices, and thereby make it difficult to solve any problem within a given step limit. Because only solved problems provide training instances for weight learning, no learning can take place until some problem is solved. Random subsets have proved a successful approach to this issue: rather than consult all of its Advisors at once, ACE randomly selects a new subset of Advisors for each problem, consults them, makes decisions based on their comments, and updates only their weights (Petrovic et al. 2007b). During the experiments in Table 5, for each problem in the learning phase, r of the variable- Table 5: Random subsets (r < 100%) improve performance. (*) indicates that only 2 runs were completed. Learning Testing 50-10 20-30 Class Subset size r 100% 30% 70% 20%-80% 100% (*) 30% 70% 20%-80% Early Unsolved failures Unsolved 52.60% 32.20% 14.56% 24.37% 93.38% 31.64% 26.45% 27.43% 42.41% 31% 11.89% 0% 9.48% 0% 7.72% 0% 58.10% 97.5% 26.63% 1% 23.27% 10 % 20.29% 0% Steps 36,835.70 3,608.27 3,962.62 3,888.62 97,972.85 15,599.71 23,499.99 13,206.32 ordering Advisors and r of the value-ordering Advisors were selected without replacement to make decisions during search on that problem. (For size 20%-80%, a random r in [.2,.8] was generated first.) Random subsets reduce early failures (problems unsolved before any weights were learned) during learning. They also reduce search tree size during testing, and increase the number of solved problems during both learning and testing, as shown in Table 5. Fewer heuristics. A benchmark Advisor expresses random preferences over the same set of choices an Advisor faces. ACE has two such benchmarks, one for variable ordering and the other for value ordering. Benchmarks are excluded from decision-making, but weights are learned for them. To speed performance during testing, ACE uses only those Advisors whose learned weight exceeds that of their respective benchmarks. This typically eliminates about half the initial Advisors. We have experimented with further reductions in the number of Advisors during testing, as shown in Table 6. The more extensive reductions eventually increased search tree sizes for the 20-30 problems. For the 50-10 problems, however, the search tree size remained stable with a 31% speedup. We believe the explanation lies in the nature of the problems themselves. When there are many values compared to the number of variables, despite inference with MAC-3, domains remain large and many values still share the same scores (and Table 6: Search tree size, number of solved problems, and time comparison with fewer Advisors during testing. Bold values are statistically significant performance reductions compared to the traditional benchmark approach (> bmk). Var. Adv. Val. Adv. 20-30 50-10 Steps Solved Time Steps Solved Time >bmk >bmk 2,864 8 4 2,956 8 2 2,930 4 4 3,198 4 2 3,176 100% 100% 100% 71% 100% 55% 100% 87% 100% 67% 19,689 19,923 19,372 19,265 19,428 91% 100% 93% 84% 93% 70% 94% 77% 94% 69% Table 7: Reduced search tree size with the full complement of learning methods described here, compared to the traditional single-heuristic approaches from Table 1. ACE values here are averaged over 10 runs; ACE Table 3 values are best individual runs. 30-8 20-30 50-10 min-domain max-degree Heuristic 563 206 10,411 5,267 51,347 46,347 max-forward-degree 220 10,150 43,890 min-domain/degree 234 4,194 35,175 max-weighted-degree 223 5,897 30,956 min-dom/dynamic-deg 211 3,942 30,791 min-dom/weighted-deg 205 4,090 30,025 ACE’s learned mixture 175 2,941 14,480 strengths). With too few value-ordering Advisors, ties among value choices occur more often, so that random selection among tied values is more likely, making search decisions less prescient. Borda-based voting. The metrics that underlie heuristics embody domain knowledge that reflects preferences among choices. Simple ranking ignores the degree of metric difference, linear interpolation attends to relative differences among scores, and exponential methods stress choices with higher scores while they reduce the influence of low-scoring choices dramatically. Two preference expression methods inspired by the Borda voting literature in political science (Brams et al. 2002) consider relative positions among scores and have proven particularly reliable (Petrovic et al. 2007a). When preference for fewer heuristics and full restart are combined with more sensitive expression of those Advisors’ preferences, it is possible to significantly reduce both computation time and the size of the search tree on difficult problems. Inference policy. Inference is intended to remove from consideration values that will not lead to a solution. An inference policy includes preprocessing, selection of an inference method, identification of relevant method parameters, and switching among methods. In pioneering work with ACE, we have shown the significant impact such a policy has on solution time, and that the choice of a good policy varies with both the problem class and the search order heuristics (Epstein et al. 2005b). We have also demonstrated how an inference policy can be learned automatically and can substantially improve performance. The most effective methods thus far are those that monitor and respond to domain changes after instantiations. They do less work than AC without reducing search performance. Table 7 compares this combination with that of the single heuristics in Table 1. ACE’s current development focuses on interleaving global with local search, and on a variety of structure-targeting representations that should continue to strengthen its ability to learn to search. Appendix: Metrics for ACE’s heuristics Each metric produces two Advisors. Metrics for variable selection were static degree, dynamic domain size, FF2, dynamic degree, number of valued neighbors, ratio of dynamic domain size to dynamic degree, ratio of dynamic domain size to degree, number of acceptable constraint pairs, static and dynamic edge degree with preference for the higher or lower degree endpoint, weighted degree, and ratio of dynamic domain size to weighted degree (Boussemart et al. 2004). Here, the degree of an edge is the sum of the degrees of its endpoints. The edge degree of a variable is the sum of edge degrees of the edges on which it is incident. Metrics for value selection were number of value pairs for the selected variable that include this value, and, for each potential value assignment: minimum resulting domain size among neighbors, number of value pairs from neighbors to their neighbors, number of values among neighbors of neighbors, neighbors' domain size, a weighted function of neighbors' domain size, and the product of the neighbors' domain sizes. Two vertices with an edge between them are neighbors. References Aardal, K. I., S. P. M. van Hoesel, A. M. C. A. Koster, C. Mannino and A. Sassano (2003). "Models and solution techniques for frequency assignment problems." 4O 1(4): 261-317. Bessière, C. (2007). Learning Implied Global Constraints. In Proceedings of IJCAI-2007, Hyderabad, India. Bessière, C. and J.-C. Régin (2001). Refining the basic constraint propagation algorithm. In Proceedings of IJCAI-2001. Borrett, J., E. Tsang and T. Walsh (1996). Adaptive constraint satisfaction. In Proceedings of ECAI-96. Boussemart, F., F. Hemery, C. Lecoutre and L. Sais (2004). Boosting systematic search by weighting constraints. In Proceedings of ECAI-2004, IOS Press. Brams, S. J. and P. C. Fishburn (2002). Voting procedures. Handbook of Social Choice and Welfare Volume 1: 173236. Carchrae, T. and J. C. Beck (2005). Cost-based Large Neighborhood Search In Proceedings of Workshop on the Combination of Metaheuristic and Local Search with Constraint Programming Techniques. Caseau, Y., G. Silverstein and F. Laburthe (1999). A MetaHeuristic Factory for Vehicle Routing Problems. In Proceedings of CP-1999, Springer Verlag. Caseau, Y., G. Silverstein and F. Laburthe (2004). "Learning Hybrid Algorithms for Vehicle Routing Problems " Theory and Practice of Logic Programming 1(6): 779-806. Cheeseman, P., B. Kanefsky and W. M. Taylor (1991). Where the REALLY Hard Problems Are. In Proceedings of IJCAI-91, Sidney, Australia. View publication stats Dechter, R. (2003). Constraint Processing. San Francisco, CA, Morgan Kaufmann. Dechter, R. and J. Pearl (1987). "Network-based heuristics for constraint satisfaction problems." A r t i f i c i a l Intelligence 34: 1-38. Eisenberg, C. and B. Faltings (2003). Using the Breakout Algorithm to Identify Hard and Unsolvable Subproblems. In Proceedings of CP-2003, Springer Verlag. Epstein, S. L. (1994). "For the Right Reasons: The FORR Architecture for Learning in a Skill Domain." Cognitive Science 18(3): 479-511. Epstein, S. L., E. C. Freuder and R. J. Wallace (2005a). "Learning to Support Constraint Programmers." Computational Intelligence 21(4): 337-371. Epstein, S. L., E. C. Freuder, R. M. Wallace and X. Li (2005b). Learning Propagation Policies. In Proceedings of Second International Workshop on Constraint Propagation and Implementation, Sitges, Spain. Gomes, C., C. Fernandez, B. Selman and C. Bessière (2004). Statistical Regimes Across Constrainedness Regions. In Proceedings of CP- 2004, Springer-Verlag. Hoos, H. H. and T. Stützle (2004). Stochastic Local Search: Foundations and Applications. San Francisco, Morgan Kaufmann. Hulubei, T. and B. O'Sullivan (2005). Search heuristics and heavy-tailed behavior. In Proceedings of CP 2005, Springer-Verlag. Lecoutre, C., F. Boussemart and F. Hemery (2004). Backjump-based techniques versus conflict-directed heuristics. In Proceedings of ICTAI-2004. Nareyek, A., E. C. Freuder, R. Fourer, E. Giunchiglia, R. P. Goldman, H. Kautz, et al. (2005). "Constraints and AI planning." IEEE Intelligent Systems 20(2): 62 - 72. Otten, L., M. Grønkvist and D. P. Dubashi (2006). Randomization in Constraint Programming for Airline Planning. In Proceedings of CP-2006, Springer Verlag. Petrie, K. E. and B. M. Smith (2003). Symmetry breaking in graceful graphs. In Proceedings of CP-2003, Kinsale, Ireland, Springer Verlag. Petrovic, S. and S. L. Epstein (2006). Full Restart Speeds Learning. In Proceedings of FLAIRS-2006. Petrovic, S. and S. L. Epstein (2007a). Preferences Improve Learning to Solve Constraint Problems. In Proceedings of AAAI07 Workshop on Preference for Artificial Intelligence. Petrovic, S. and S. L. Epstein (2007b). Random Subsets Support Learning a Mixture of Heuristics. In Proceedings of FLAIRS 2007, Key West, AAAI. Refalo, P. (2004). Impact-based search strategies for constraint programming. In Proceedings of CP-2004. Ruml, W. (2001). Incomplete Tree Search using Adaptive Probing. In Proceedings of IJCAI-2001. Sabin, D. and E. C. Freuder (1997). Understanding and Improving the MAC Algorithm. In Proceedings of CP97., Springer Verlag: 167-181.