Abstract
A tailored military recruiting approach that focuses on a subset of population with highest recruiting potential is being considered by the Canadian Armed Forces. In this paper, a logistic regression model (supervised machine learning) was developed using 4 years of historical applicant data. The score obtained was used to rank Canadian postal codes and to identify the ones with the highest potential for recruitment of women. Additional filtering was applied using marketing segments provided by a vendor. The final selection was clustered (unsupervised machine learning) based on the collective social media behaviour of each postal code and was binned using the distance to the nearest recruiting centre. The clustering and the binning were derived to select the optimum marketing channel and message.







Similar content being viewed by others
Notes
As of end of 2019.
Compared to the collective score applied in the U.S., the Canadian PRIZM segments are derived at the postal code level. This granularity allows for a better modeling; the U.S. ZIP code is at least 10 times larger by population count compared to the Canadian postal code.
This was suggested by an anonymous reviewer.
The variance inflation factor (VIF) quantifies the severity of multicollinearity in an ordinary least square regression analysis. It provides an index that measures how much the variance (the square of the estimate’s standard deviation) of an estimated regression coefficient is increased because of collinearity. A rule of thumb is that if VIF > 3, then multicollinearity is high.
The condition index is the standard measure of ill-conditioning in a matrix. It is computed by finding the square root of the maximum eigenvalue divided by the minimum eigenvalue. If the condition index is above 30, the regression may have a significant multicollinearity. Multicollinearity exists if two or more of the values related to the high condition index have a high proportion of variance explained. One advantage of this method is that it shows which variables are collinear.
References
Department of National Defence. Strong, secure, engaged: Canada’s Defence Policy; 2017.
Ueno R, Bryce R, Calitoiu D. Ranking clusters of postal codes to improve recruitment in the Canadian Armed Forces. In Proceedings of the 18th IEEE international conference on machine learning and application, vol. 1, p. 1192–1197, Boca Raton, FL, USA. 2019.
Buttrey SE, Whitaker LR, Alt JK. Developments in the statistical modeling of military recruiting. Chance. 2018;31(2):38–44.
Fulton BM. Determining market categorization of United States zip codes for purposes of army recruiting. Master’s thesis, Naval Postgraduate School, Monterey, CA. 2016.
Monaghan EM, Estimating the depth of the navy recruiting market. Master’s thesis, Naval Postgraduate School, Monterey, CA. 2016.
Nielsen Company. My best segments. https://claritas360.claritas.com/mybestsegments/. Accessed Jan. 31, 2020.
Clingan M, U.S. army custom segmentation system. Presentation at 75th Military Operations Research Society Symposium, June 12–14, Annapolis, MD; 2007.
Ayzenshtat L, Rajamani K, Vishnevskiy A, Georgiev N, Preotescu M. Methods and apparatus to identify affinity between segment attributes and product characteristics. US patent 2019;0073685 A1.
Boone C, Cadenasso M, Grove J, Schwarz K, Buckley G. Landscape, vegetation characteristics, and group identity in an urban and suburban watershed: why the 60s matter. Urban Ecosyst. 2010;13(3):255–71.
Heitgerd JL. Using gis and demographics to characterize communities at risk: a model for atsdr. J Environ Health. 2001;64(5):21–30.
Ueno R, Boyd P, Calitoiu D. Identifying geographical areas using machine learning for enrolling women in the Canadian Armed Forces. In Proceedings of the 10th international conference on operations research and enterprise systems, p. 307–316. 2021.
Environics Analytics. Demostats database. https://www.environicsanalytics.com/en-ca/data/demographic/demostats. Accessed August 20, 2021.
Jolliffe IT. Principal component analysis. New York: Springer; 2002.
Skiena SS. The data science design manual. New York: Springer Publishing Company Incorporated; 2017.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
Ward JH. Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963;58(301):236–44.
MacQueen, J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1(14), p. 281–297. 1967.
Kodinariya T, Makwana PR. Review on determining of cluster in K-means clustering. Int J Adv Res Comput Sci Manage Stud. 2013;1(01):90–5.
Pham DT, Dimov SS, Nguyen CD. Selection of K in K-means clustering. Proc Inst Mech Eng C J Mech Eng Sci. 2005;219(1):103–19.
Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323(9):533–6.
Kingma DP, Ba J, Adam: A method for stochastic optimizaton. https://arxiv.org/pdf/1412.6980.pdf. Accessed August 20, 2021. 2015.
Harnett DL. Introduction to statistical methods. Boston: Addison-Wesley Pub. Co.; 1975.
Department of National Defence. Annual report on regular force personnel: 2015–2016. Internal Reference Document: DRDC-RDDC-2018-D086; 2018.
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B. 1977;39(1):1–38.
Acknowledgements
The authors are very grateful for the insightful comments and suggestions made by the anonymous reviewers. Their input helped in improving the quality of the final version of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Operations Research and Enterprise Systems” guest edited by Federico Liberatore, Greg H. Parlier and Marc Demange.
Appendices
Appendix
PRIZM Segments: Descriptions and Maps
Below are the descriptions of PRIZM segments defined in the technical documentation provided by Environics Analytics and the corresponding maps with enrollments (Figs. 8, 9, 10, 11, and 12.
-
PRIZM 32
Mini Van & Vin Rouge represents a collection of younger and middle-aged families and couples who live in new exurban communities beyond Quebec’s big cities. These households consist of married and common-law couples. Although most households are French-speaking, more than 40 percent are bilingual. Their mixed educations provide good blue- and white-collar jobs in the construction and manufacturing sectors, as well as management roles within the public-service sector, resulting in upper-middle-incomes and active lifestyles. Residents here have gym memberships and enjoy sports like hockey, cross-country and downhill skiing. After all that fresh air and exercise, they reward themselves by picking up dinner at a chicken restaurant or kicking back with a glass of Shiraz in their single- or semi-detached homes. For a night out, they might head to the opera or a popular music concert; their idea of a vacation is anything from a resort package to a sightseeing tour around the U.S. and Caribbean.
-
PRIZM 37
Younger and middle-aged families comprise Trucks & Trades, where skilled tradespeople and blue-collar workers have built a comfortable lifestyle while accumulating tidy savings. Concentrated in Alberta and the Prairies, this segment has a disproportionate number of oil and gas workers who have sought out jobs in resource-rich lands over the past two decades. What workers may lack in education, they make up for with practical skills in primary industries as well as the trades and transportation sector. Many families are younger and middle-aged—most children are under the age of 15—and live in single- and semi-detached houses built between 1961 and 1990. There’s also an above-average presence of mobile dwellings hauled in to accommodate the sudden influx of workers. When not working hard, these households play hard: fishing, hunting, golfing, ATVing, snowmobiling and playing baseball, along with other sports. They also have high rates for owning boats, camping trailers and motorcycles.
-
PRIZM 16
One of the largest lifestyles in Canada, Pets & PCs is a haven for younger families with preschool children in the new suburbs surrounding larger cities. Nearly half of the children in this segment are under the age of 10, while many of the maintainers are under 45. Pets & PCs households have a strong presence of immigrants from China, the Philippines and India. Few segments have more new housing than this group; most residents have settled into a mix of single-detached, semi-detached and row house developments. With upscale incomes, segment members have crafted an active, child-centred lifestyle. These families participate in many team sports, including baseball, basketball and hockey, and they shuttle kids and their gear to games in spacious SUVs—typically newer models. On weekends, they head to kid-friendly destinations like zoos, aquariums and amusement parks. They fill their homes with an array of computers and electronic gear, including video game systems, tablets and just about anything that will occupy their children while the moms and dads grab the occasional date night at the movies or dinner at their favourite seafood restaurants.
-
PRIZM 24
Widely dispersed across Canada, Fresh Air Families is one of the largest segments—and growing. Found in rapidly expanding exurban communities, these neighbourhoods feature a mix of middle-aged couples and families with children of a broad spectrum of ages. While most adults have high school, trade school or college educations, these dual-income households enjoy solid, upper-middle-income lifestyles thanks to positions in public administration, construction and the skilled trades. They own single-detached homes, typically built since 1990, and nine out of 10 commutes by car to jobs in nearby suburbs. With its couples and families, the segment scores high for a range of marketplace preferences, frequenting big-box retailers, large department stores and discount grocers. Members of Fresh Air Families enjoy the great outdoors, particularly fishing, boating, canoeing and camping. Indeed, some of their favourite leisure activities are evident in their driveways, typically cluttered with boats, campers or motorcycles—and pickup trucks to haul them to parks and campgrounds. But they also enjoy indoor pursuits like crafting and knitting.
-
PRIZM 63
Located in dense, industrial neighbourhoods scattered across mid-sized cities, Lunch at Tim’s consists of singles, families and solo-parent households living in older single-detached homes, semis and duplexes. They’re the kind of tight-knit communities where residents enjoy socializing at local eateries like Tim Hortons—as well as pizza places, burger joints and fish-and-chip restaurants. With an unusually mixed age profile—it’s no longer the bi-modal segment of the past—Lunch at Tim’s residents have above-average rates for residents who are single, divorced, separated or widowed; nearly half the adults in these neighbourhoods are unattached. Despite the lower-middle incomes, roughly two-thirds of households own their homes, mostly built before 1980. Residents enjoy quieter pastimes and have high rates for knitting and woodworking, as well as outdoor activities like hiking and swimming. When the mood strikes, they might play a friendly game of curling or splurge on tickets to a dinner theatre, baseball game or boat or craft show.
Rights and permissions
About this article
Cite this article
Calitoiu, D., Ueno, R. & Boyd, P. Supervised and Unsupervised Machine Learning Methods Applied for Enrolling Women in the Canadian Armed Forces. SN COMPUT. SCI. 3, 250 (2022). https://doi.org/10.1007/s42979-022-01115-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01115-y