Module:en-headword

From Wiktionary, the free dictionary
Jump to navigation Jump to search

This module is used for English headword-line templates. This module currently implements {{en-noun}}, {{en-proper noun}}, {{en-verb}}, {{en-adj}}, {{en-adv}}, {{en-intj}} and {{en-con}} (for conjunctions). See the documentation of those templates for more information. Other English headword templates are in the process of being converted to use this module.

The module is always invoked the same way, by passing a single parameter to the "show" function. This parameter is the name of the part of speech, but in plural (examples given are for nouns, and for adjective forms respectively):

{{#invoke:en-headword|show|nouns}}
{{#invoke:en-headword|show|adjective forms}}

The template will, by default, accept the following parameters (specific parts of speech may accept or require others):

|head=, |head2=, |head3=, ...
Override the headword display; used to add links to individual words in a multiword term.
|id=
Sense ID for linking to this headword. See {{senseid}} for more information.
|nolink=1 or |nolinkhead=1
Don't link individual words in the headword of a multiword term. Useful for foreign or otherwise unanalyzable terms like a posteriori and yabba dabba doo where the expression functions as a whole in English but the individual parts are not English words.
|splithyph=1
Indicate that automatic splitting and linking of words should split on hyphens in multiword expressions with spaces in them, even if the hyphenated component would normally be linked as-is or with hyphens converted to spaces. See #Autosplitting below.
|nosplithyph=1
Indicate that automatic splitting and linking of words should not split on hyphens in multiword expressions with spaces in them, even if this would normally happen. See #Autosplitting below.
|hyphspace=1
Indicate that hyphenated components should be linked as a whole using the space-separated equivalent, even if this would not normally happen (i.e. because the space-separated equivalent is not defined as an English term). See #Autosplitting below.
|nosuffix=1
Prevent terms beginning with a hyphen from being interpreted as suffixes. See #Suffix handling below.
|nomultiwordcat=1
Prevent multiword terms (those with spaces or with hyphens in the middle) from being added to Category:English multiword terms.
|pagename=
Override the page name used to compute default values of various sorts. Useful when testing, for documentation pages, etc.
|sort=
Sort key. Rarely needs to be specified, as it is normally automatically generated.

Autosplitting

All templates using this module use an intelligent autosplitting algorithm to link portions of multipart and multiword expressions, as follows:

  • If there are spaces in the term but no apostrophes or hyphens, the module will automatically split and link distinct space-separated words, similarly to {{head}}; hence, absent without leave will be linked as [[absent]] [[without]] [[leave]].
  • If there are spaces and apostrophes but no hyphens, the module will likewise split and link distinct space-separated words, but may also split up words with apostrophes in them. Specifically:
    1. If a word ends in 's, the part before the 's will be linked as a word, and the 's will be linked separately to -'s, on the assumption that the 's is functioning as a possessive. For example, Abel's impossibility theorem will be linked as [[Abel]][[-'s|'s]] [[impossibility]] [[theorem]]. (An exception is made for one's, someone's, he's, she's and it's, which are linked as-is without splitting.)
    2. If a word ends in ', the apostrophe will be linked to -' (on the assumption that the ' is functioning as a plural possessive, similarly to above), and the part before will be separately linked. If the part before ends in an s, the module converts it to its singular equivalent and looks that up to see if it exists and has a definition as an English term. If so, the term is linked to the singular form; otherwise, it is linked to the plural form. (Converting to the singular means that -ies becomes -y; -es is dropped after sh, ch and x; and otherwise s is dropped.) For example, flies' graveyard will be linked as [[fly|flies]][[-'|']] [[graveyard]] because fly exists as an English term, but Achilles' heel will be linked as [[Achilles]][[-'|']] [[heel]] because Achille does not exist as an English term.
    3. All other terms containing apostrophes are linked unsplit.
  • If there are hyphens in the term but no spaces or apostrophes, the hyphenated components will be linked individually. For example, beggar-thy-neighbor will be linked as [[beggar]]-[[thy]]-[[neighbor]].
    • An exception to this occurs with certain recognized prefixes, which are linked with the hyphen included in the prefix. For example, Afro-American is linked as [[Afro-]][[American]] and co-occurrence is linked as [[co-]][[occurrence]], because Afro- and co- are in the list of recognized prefixes. (For the full list, see below.)
  • If there are hyphens and apostrophes but no spaces, the effect is similar to the situation with spaces and apostrophes. For example, beggar's-lice is linked as [[beggar]][[-'s|'s]]-[[lice]].
  • If there are both hyphens and spaces, the space-separated components that do not have hyphens will be linked separately, as above. Any hyphen-separated components may be linked in one of three ways:
    1. If |hyphspace=1 is specified or the hyphen-separated component exists as an English term when the hyphens are converted to spaces, it will be linked to that term. For example, closed-circuit television will be linked as [[closed circuit|closed-circuit]] [[television]] because closed circuit exists as an English term. (In this case, closed-circuit also exists but is approximately a soft redirect to closed circuit, as is often the case with such attributive compounds. This is why we prefer the space-separated variant.)
    2. If |nosplithyph=1 is specified or the hyphen-separated component exists as an English term in its unmodified form but not when the hyphens are converted to spaces, it will be linked as an unmodified whole. For example, coin-operated laundry will be linked as [[coin-operated]] [[laundry]] because coin-operated exists as an English term but coin operated does not. (An example that requires |nosplithyph=1 is close-up lens, where the default algorithm would incorrectly link the first component to close up. Here, close up [a verb] and close-up [an adjective] both exist but refer to different things.)
    3. If |splithyph=1 is specified or the hyphen-separated component does not exist as an English term (either unmodified or when the hyphens are converted to spaces), each hyphenated component is linked separately. Examples where this happens are adult-onset diabetes (linked as [[adult]]-[[onset]] [[diabetes]]) and Bombieri-Friedlander-Iwaniec theorem linked as [[Bombieri]]-[[Friedlander]]-[[Iwaniec]] [[theorem]]). Note that when separately linking hyphenated components, prefixes are recognized and handled specially, as documented below.

Special prefix handling

As described above, when splitting hyphenated components, if a component is not the last component and looks like one of the following prefixes, the following hyphen will be included inside of the link.

Suffix handling

If the term begins with a hyphen (-), it is assumed to be a suffix rather than a base form, and is categorized into Category:English suffixes and Category:English POS-forming suffixes rather than Category:English POSs (e.g. Category:English noun-forming suffixes rather than Category:English nouns). This can be overridden using |nosuffix=1. (An example where this is necessary is -ussification, which refers to a linguistic process of blending words with the suffix -ussy but is not itself a suffix.)

The default behavior described above under #Autosplitting is sufficient in most circumstances, but some multiword terms need special linking behavior to handle things like inflected terms (e.g. those ending in -ing or -s), capitalized terms, multiword subexpressions, etc. One way to handle that is to use |head= and spell out the entire headword, appropriately linked, effectively ignoring the default linking behavior. But this can be awkward for long multiword terms. For cases like this, a shortcut syntax is provided to apply link modifications on top of the autolinked term. To enable this, put a tilde (~) at the beginning of the value specified to |head=, followed by the changes to individual words.

For example, for the term acute necrotising ulcerative gingivitis, we would like to link necrotising to necrotise. This can be done as follows:

  • {{en-noun|head=~necrotising:necrotise}}

or more compactly as

This is equivalent to writing {{en-noun|head=[[acute]] [[necrotise|necrotising]] [[ulcerative]] [[gingivitis]]}}, but shorter. In general, syntax of the form prefix[from:to] is equivalent to writing prefixfrom:prefixto, and says to replace prefixfrom with prefixto in the default output produced by the #Autosplitting mechanism described above.

The same syntax works on the beginning of a word, which is especially useful when linking to the lowercase equivalent of a capitalized term. For example, for admiral of the Swiss Navy, use the following to link Navy to navy:

This is equivalent to writing {{en-noun|head=[[admiral]] [[of]] [[the]] [[Swiss]] [[navy|Navy]]}} but shorter.

Modifications need to match full words, but can be applied to multiple words. A ~ on the right-hand side is a shortcut that stands for the left-hand side, which is especially useful when multiple words are given on the left-hand side, and causes the words to be linked together. For example, for acute respiratory distress syndrome, to link respiratory distress as a single entity, use the following:

  • {{en-noun|head=~respiratory distress:~}}

which is equivalent to {{en-noun|head=[[acute]] [[respiratory distress]] [[syndrome]]}}. The right-hand side need not consist solely of a tilde, but can contain other surrounding text. For example, for Charlie Brown Christmas tree, use the following to link to the Wikipedia entry for Charlie Brown:

  • {{en-noun|head=~Charlie Brown:w:~}}

This is equivalent to writing {{en-noun|head=[[w:Charlie Brown|Charlie Brown]] [[Christmas]] [[tree]]}}.

Multiple modifications can be specified, separated by a semicolon (optionally with surrounding spaces). For example, for Admiral of the Fleet, use:

  • {{en-noun|head=~[A:a]dmiral; [F:f]leet}}

This is equivalent to writing {{en-noun|head=[[admiral|Admiral]] [[of]] [[the]] [[fleet|Fleet]]}}.


local export = {}
local pos_functions = {}

local force_cat = false -- for testing; if true, categories appear in non-mainspace pages

local require = require
local require_when_needed = require("Module:require when needed")

local en_utilities_module = "Module:en-utilities"
local headword_utilities_module = "Module:headword utilities"
local headword_module = "Module:headword"
local inflection_utilities_module = "Module:inflection utilities"
local JSON_module = "Module:JSON"
local links_module = "Module:links"
local parameters_module = "Module:parameters"
local string_utilities_module = "Module:string utilities"
local table_module = "Module:table"
local utilities_module = "Module:utilities"

local iut = require_when_needed(inflection_utilities_module)

local add_links_to_multiword_term = require_when_needed(headword_utilities_module, "add_links_to_multiword_term")
local add_suffix = require_when_needed(en_utilities_module, "add_suffix")
local apply_link_modifiers = require_when_needed(headword_utilities_module, "apply_link_modifiers")
local concat = table.concat
local format_categories = require_when_needed(utilities_module, "format_categories")
local full_headword = require_when_needed(headword_module, "full_headword")
local get_link_page = require_when_needed(links_module, "get_link_page")
local insert = table.insert
local is_regular_plural = require_when_needed(en_utilities_module, "is_regular_plural")
local list_to_set = require_when_needed(table_module, "listToSet")
local remove = table.remove
local remove_links = require_when_needed(links_module, "remove_links")
local process_params = require_when_needed(parameters_module, "process")
local singularize = require_when_needed(string_utilities_module, "singularize")
local split = require_when_needed(string_utilities_module, "split")
local toJSON = require_when_needed(JSON_module, "toJSON")
local toNFD = mw.ustring.toNFD
local ulen = require_when_needed(string_utilities_module, "len")
local ulower = require_when_needed(string_utilities_module, "lower")
local umatch = require_when_needed(string_utilities_module, "match")

local lang = require("Module:languages").getByCode("en")
local langname = lang:getCanonicalName()

local function glossary_link(entry, text)
	text = text or entry
	return "[[Appendix:Glossary#" .. entry .. "|" .. text .. "]]"
end

local function track(page)
	require("Module:debug/track")("en-headword/" .. page)
	return true
end

------------------------------------------- UTILITY FUNCTIONS ------------------------------------------

-- These functions are used directly in the <> format as well as in the utility functions #2 below.

local function compute_double_last_cons_stem(term)
	local last_cons = term:match("([bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ])$")
	if not last_cons then
		error("Verb stem '" .. term .. "' must end in a consonant to use ++")
	end
	return term .. last_cons
end

local function compute_plusplus_s_form(term, default_s_form)
	if term:find("[sz]$") then
		-- regas -> regasses, derez -> derezzes
		return compute_double_last_cons_stem(term) .. "es"
	else
		return default_s_form
	end
end


-- The main entry point.
-- This is the only function that can be invoked from a template.
function export.show(frame)

	local poscat = frame.args[1] or error("Part of speech has not been specified. Please pass parameter 1 to the module invocation.")
	
	local boolean = {type = "boolean"}
	local params = {
		["head"] = {list = true},
		["id"] = true,
		["json"] = boolean,
		["sort"] = true,
		["splithyph"] = boolean,
		["nosplithyph"] = boolean,
		["hyphspace"] = boolean,
		["nolink"] = boolean,
		["nolinkhead"] = {type = "boolean", alias_of = "nolink"},
		["nosuffix"] = boolean,
		["nomultiwordcat"] = boolean,
		["pagename"] = true, -- for testing
	}

	local pos_data = pos_functions[poscat]
	if pos_data then
		for key, val in pairs(pos_data.params) do
			params[key] = val
		end
	end

	local args = process_params(frame:getParent().args, params, nil, "en-headword", "show")

	local pagename = args.pagename or mw.loadData("Module:headword/data").pagename -- Accounts for unsupported titles.

	local user_specified_heads = args.head
	local heads = user_specified_heads
	local autohead
	if args.nolink or not pagename:find("[ '%-]") then
		autohead = pagename
	else
		local en_no_split_apostrophe_words = list_to_set{
			"one's",
			"someone's",
			"he's",
			"she's",
			"it's",
		}

		local en_include_hyphen_prefixes = list_to_set{
			-- We don't include things that are also words even though they are often (perhaps mostly) prefixes, e.g.
			-- "be", "counter", "cross", "extra", "half", "mid", "over", "pan", "under".
			"acro",
			"acousto",
			"Afro",
			"agro",
			"anarcho",
			"angio",
			"Anglo",
			"ante",
			"anti",
			"arch",
			"auto",
			"bi",
			"bio",
			"cis",
			"co",
			"cryo",
			"crypto",
			"de",
			"demi",
			"eco",
			"electro",
			"Euro",
			"ex",
			"Greco",
			"hemi",
			"hydro",
			"hyper",
			"hypo",
			"infra",
			"Indo",
			"inter",
			"intra",
			"Judeo",
			"macro",
			"meta",
			"micro",
			"mini",
			"multi",
			"neo",
			"neuro",
			"non",
			"para",
			"peri",
			"post",
			"pre",
			"pro",
			"proto",
			"pseudo",
			"re",
			"semi",
			"sub",
			"super",
			"trans",
			"un",
			"vice",
		}

		local function is_english(term)
			local title = mw.title.new(term)
			if title and title.exists then
				local content = title:getContent()
				if content and content:find("==English==\n") then
					return true
				end
			end
			return false
		end

		local function en_split_hyphen_when_space(word)
			if not word:find("%-") then
				return nil
			end
			if args.hyphspace then
				return "[[" .. word:gsub("%-", " ") .. "|" .. word .. "]]"
			end
			if args.nosplithyph then
				return "[[" .. word .. "]]"
			end
			if not args.splithyph then
				local space_word = word:gsub("%-", " ")
				if is_english(space_word) then
					return "[[" .. space_word .. "|" .. word .. "]]"
				end
				if is_english(word) then
					return "[[" .. word .. "]]"
				end
			end
			return nil
		end

		local function en_split_apostrophe(word)
			local base = word:match("^(.*)'s$")
			if base then
				return "[[" .. base .. "]][[-'s|'s]]"
			end
			base = word:match("^(.*)'$")
			if base then
				if base:find("s$") then
					local sg = singularize(base)
					if is_english(sg) then
						return "[[" .. sg .. "|" .. base .. "]][[-'|']]"
					end
				end
				return "[[" .. base .. "]][[-'|']]"
			end
			return "[[" .. word .. "]]"
		end

		autohead = add_links_to_multiword_term(pagename, {
			split_hyphen_when_space = en_split_hyphen_when_space,
			split_apostrophe = en_split_apostrophe,
			no_split_apostrophe_words = en_no_split_apostrophe_words,
			include_hyphen_prefixes = en_include_hyphen_prefixes,
		})
	end

	if #heads == 0 then
		heads = {autohead}
	else
		for i, head in ipairs(heads) do
			if head:find("^~") then
				head = apply_link_modifiers(autohead, head:sub(2))
				heads[i] = head
			end
			if head == autohead then
				track("redundant-head")
			end
		end
	end

	local data = {
		lang = lang,
		pos_category = poscat,
		categories = {},
		heads = heads,
		user_specified_heads = user_specified_heads,
		no_redundant_head_cat = #user_specified_heads == 0,
		inflections = {},
		nomultiwordcat = args.nomultiwordcat,
		sort_key = args.sort,
		pagename = args.pagename,
		-- This is always set, and in the case of unsupported titles, it's the displayed version (e.g. 'C|N>K' instead of
		-- 'Unsupported titles/C through N to K').
		displayed_pagename = pagename,
		id = args.id,
		force_cat_output = force_cat,
	}

	local is_suffix = false
	if not args.nosuffix and pagename:find("^%-") and not pagename:find("^%-%-") and poscat ~= "suffix forms" then
		is_suffix = true
		data.pos_category = "suffixes"
		local singular_poscat = singularize(poscat)
		insert(data.categories, langname .. " " .. singular_poscat .. "-forming suffixes")
		insert(data.inflections, {label = singular_poscat .. "-forming suffix"})
	end

	if pos_data then
		pos_data.func(args, data, is_suffix)
	end

	local extra_categories = {}
	if pagename:find("[Qq][^Uu]") or pagename:find("[Qq]$") then
		insert(data.categories, langname .. " words containing Q not followed by U")
	end
	-- toNFD performs decomposition, so letters that decompose to an ASCII
	-- vowel and a diacritic, such as é, are counted as vowels anddo not do not
	-- need to be included in the pattern.
	if not umatch(ulower(toNFD(pagename)), "[aeiouyæœøəªºαεηιουω]") then
		insert(data.categories, langname .. " words without vowels")
	end
	if pagename:find("yre$") then
		insert(data.categories, langname .. ' words ending in "-yre"')
	end
	if not pagename:find(" ") and ulen(pagename) >= 25 then
		insert(extra_categories, "Long " .. langname .. ' words')
	end
	if pagename:find("^[^aeiou ]*a[^aeiou ]*e[^aeiou ]*i[^aeiou ]*o[^aeiou ]*u[^aeiou ]*$") then
		insert(data.categories, langname .. ' words that use all vowels in alphabetical order')
	end

	if args.json then
		return toJSON(data)
	end

	return full_headword(data)
		.. (#extra_categories > 0
			and format_categories(extra_categories, lang, args.sort)
			or "")
end


-- This function does the common work between adjectives and adverbs
local function make_comparatives(params, data)
	local comp_parts = {label = glossary_link("comparative"), accel = {form = "comparative"}}
	local sup_parts = {label = glossary_link("superlative"), accel = {form = "superlative"}}
	local pagename = data.displayed_pagename

	if #params == 0 then
		insert(params, {"more"})
	end

	-- Go over each parameter given and create a comparative and superlative
	-- form.
	for i, val in ipairs(params) do
		local comp = val[1]
		local comp_qual = val[2]
		local sup = val[3]
		local sup_qual = val[4]
		local comp_part, sup_part

		if comp == "more" and pagename ~= "many" and pagename ~= "much" then
			comp_part = "more [[" .. pagename .. "]]"
			sup_part = sup or "most [[" .. pagename .. "]]"
		elseif comp == "further" and pagename ~= "far" then
			comp_part = "further [[" .. pagename .. "]]"
			sup_part = sup or "furthest [[" .. pagename .. "]]"
		elseif comp == "er" then
			-- Add the "-er" and "-est" suffixes.
			comp_part = add_suffix(pagename, "r")
			sup_part = sup or add_suffix(pagename, "st.superlative")
		elseif comp == "ier" then
			if pagename:sub(-1) ~= "y" then
				error("Can't specify 'ier' comparative unless the term ends with 'y'.")
			end
			comp_part = pagename:gsub("e?y$", "ier")
			sup_part = sup or pagename:gsub("e?y$", "iest")
		elseif comp == "-" or sup == "-" then
			-- Allowing '-' makes it more flexible to not have some forms
			if comp ~= "-" then
				comp_part = comp
			end
			if sup ~= "-" then
				sup_part = sup
			end
		else
			-- If the full comparative was given, but no superlative, then
			-- create it by replacing the ending -er with -est.
			if not sup then
				if comp:sub(-2) == "er" then
					sup = comp:sub(1, -3) .. "est"
				else
					error("The superlative of \"" .. comp .. "\" cannot be generated automatically. Please provide it with the \"sup" .. (i == 1 and "" or i) .. "=\" parameter.")
				end
			end

			comp_part = comp
			sup_part = sup
		end

		if comp_part then
			insert(comp_parts, {term = comp_part, q = {comp_qual}})
		end
		if sup_part then
			insert(sup_parts, {term = sup_part, q = {sup_qual}})
		end
	end

	insert(data.inflections, comp_parts)
	insert(data.inflections, sup_parts)
end


local function make_heads_definite(args, data)
	if args.def == "~" then
		local newheads = {}
		for _, head in ipairs(data.heads) do
			insert(newheads, head)
			insert(newheads, "the " .. head)
		end
		data.heads = newheads
	else
		for i, head in ipairs(data.heads) do
			data.heads[i] = "the " .. head
		end
	end
end


local function non_op()
end


pos_functions["adjectives"] = {
	params = {
		[1] = {list = true, allow_holes = true},
		["def"] = true,
		["the"] = {alias_of = "def"},
		["comp_qual"] = {list = "comp\1_qual", allow_holes = true},
		["sup"] = {list = true, allow_holes = true},
		["sup_qual"] = {list = "sup\1_qual", allow_holes = true},
		},
	func = function(args, data)
		local shift = 0
		local is_not_comparable = false
		local is_comparative_only = false

		if args.def then
			make_heads_definite(args, data)
		end

		-- If the first parameter is ?, then don't show anything, just return.
		if args[1][1] == "?" then
			return
		-- If the first parameter is -, then move all parameters up one position.
		elseif args[1][1] == "-" then
			shift = 1
			is_not_comparable = true
		-- If the only argument is +, then remember this and clear parameters
		elseif args[1][1] == "+" and args[1].maxindex == 1 then
			shift = 1
			is_comparative_only = true
		end

		-- Gather all the comparative and superlative parameters.
		local params = {}

		for i = 1, args[1].maxindex - shift do
			local comp = args[1][i + shift]
			local comp_qual = args["comp_qual"][i + shift]
			local sup = args["sup"][i]
			local sup_qual = args["sup_qual"][i + shift]

			if comp or sup then
				insert(params, {comp, comp_qual, sup, sup_qual})
			end
		end

		if shift == 1 then
			-- If the first parameter is "-" but there are no parameters,
			-- then show "not comparable" only and return.
			-- If there are parameters, then show "not generally comparable"
			-- before the forms.
			if #params == 0 then
				if is_not_comparable then
					insert(data.inflections, {label = "not " .. glossary_link("comparable")})
					insert(data.categories, langname .. " uncomparable adjectives")
					return
				end
				if is_comparative_only then
					insert(data.inflections, {label = glossary_link("comparative") .. " form only"})
					insert(data.categories, langname .. " comparative-only adjectives")
					return
				end
			else
				insert(data.inflections, {label = "not generally " .. glossary_link("comparable")})
			end
		end

		-- Process the parameters
		make_comparatives(params, data)
	end
}

pos_functions["adverbs"] = {
	params = {
		[1] = {list = true, allow_holes = true},
		["comp_qual"] = {list = "comp\1_qual", allow_holes = true},
		["sup"] = {list = true, allow_holes = true},
		["sup_qual"] = {list = "sup\1_qual", allow_holes = true},
		},
	func = function(args, data)
		local shift = 0

		-- If the first parameter is ?, then don't show anything, just return.
		if args[1][1] == "?" then
			return
		-- If the first parameter is -, then move all parameters up one position.
		elseif args[1][1] == "-" then
			shift = 1
		end

		-- Gather all the comparative and superlative parameters.
		local params = {}

		for i = 1, args[1].maxindex - shift do
			local comp = args[1][i + shift]
			local comp_qual = args["comp_qual"][i + shift]
			local sup = args["sup"][i]
			local sup_qual = args["sup_qual"][i + shift]

			if comp or sup then
				insert(params, {comp, comp_qual, sup, sup_qual})
			end
		end

		if shift == 1 then
			-- If the first parameter is "-" but there are no parameters,
			-- then show "not comparable" only and return. If there are parameters,
			-- then show "not generally comparable" before the forms.
			if #params == 0 then
				insert(data.inflections, {label = "not " .. glossary_link("comparable")})
				insert(data.categories, langname .. " uncomparable adverbs")
				return
			else
				insert(data.inflections, {label = "not generally " .. glossary_link("comparable")})
			end
		end

		-- Process the parameters
		make_comparatives(params, data)
	end
}

pos_functions["conjunctions"] = {
	params = {
		[1] = { alias_of = "head" },
	},
	func = non_op,
}

pos_functions["interjections"] = {
	params = {
		[1] = { alias_of = "head" },
	},
	func = non_op,
}

local function escape(str)
	return (str:gsub("\\([:#])", "\\\\%1")
		:gsub("[:#]", "\\%0"))
end

local function canonicalize_plural(pl, pagename, pos)
	if pl == "+" then
		return escape(add_suffix(pagename, "s.plural", pos))
	elseif pl == "++" then
		return escape(compute_plusplus_s_form(pagename, add_suffix(pagename, "s.plural", pos)))
	elseif pl == "*" then
		return escape(pagename)
	elseif pl == "ies" then
		if pagename:sub(-1) == "y" then
			return escape(pagename:gsub("e?y$", pl))
		end
		error("Can't specify 'ies' plural unless the term ends with 'y'.")
	elseif pl == "s" or pl == "es" or pl == "'s" then
		return escape(pagename .. pl)
	end
end

local function do_nouns(args, data, pos)
	local pagename = data.displayed_pagename
	pos = pos or "noun"

	local function gather_inflections_with_quals(infl_field, qual_field, label)
		-- Gather all the plural parameters from the numbered parameters.
		local infls = {}
		if label then
			infls.label = label
		end

		for i, infl in ipairs(args[infl_field]) do
			local qual = args[qual_field][i]

			if qual then
				insert(infls, {term = infl, q = {qual}})
			else
				insert(infls, infl)
			end
		end

		return infls
	end

	if args.def then
		make_heads_definite(args, data)
	end

	local plurals = gather_inflections_with_quals(1, "plqual")

	if plurals[1] == "p" then
		-- plurale tantum
		if #plurals > 1 then
			error("With plurale tantum noun, can't specify more than one plural")
		end
		data.genders = {"p"} -- this should auto-insert the correct 'pluralia tantum' category
		if #args.sg > 0 then
			insert(data.inflections, {label = "normally plural"})
			insert(data.inflections, gather_inflections_with_quals("sg", "sgqual", "singular"))
		else
			insert(data.inflections, {label = "plural only"})
		end
		if #args.attr > 0 then
			insert(data.inflections, gather_inflections_with_quals("attr", "attrqual", "attributive"))
		end
		return
	end

	local need_default_plural = pos == "noun"
	if plurals[1] == "-" then
		-- Uncountable noun; may occasionally have a plural
		remove(plurals, 1)  -- Remove the "-"
		insert(data.categories, langname .. " uncountable nouns")

		-- If plural forms were given explicitly, then show "usually"
		if #plurals > 0 then
			insert(data.inflections, {label = "usually " .. glossary_link("uncountable")})
			insert(data.categories, langname .. " countable nouns")
		else
			insert(data.inflections, {label = glossary_link("uncountable")})
		end
		need_default_plural = false
	elseif plurals[1] == "~" then
		-- Mixed countable/uncountable noun, always has a plural
		remove(plurals, 1)  -- Remove the "~"
		insert(data.inflections, {label = glossary_link("countable") .. " and " .. glossary_link("uncountable")})
		insert(data.categories, langname .. " uncountable nouns")
		insert(data.categories, langname .. " countable nouns")

		-- If no plural was given, add a default one now
		if #plurals == 0 then
			plurals[1] = escape(add_suffix(pagename, "s.plural", pos))
		end
	elseif pos == "proper noun" then
		-- For proper nouns, the default is uncountable
		insert(data.categories, langname .. " uncountable nouns")
	else
		-- For common nouns, the default is countable, has a plural
		insert(data.categories, langname .. " countable nouns")
	end
	-- Plural is unknown
	if plurals[1] == "?" then
		remove(plurals, 1)  -- Remove the "?"
		-- Not desired; see [[Wiktionary:Tea_room/2021/August#"Plural unknown or uncertain"]]
		-- insert(data.inflections, {label = "plural unknown or uncertain"})
		insert(data.categories, langname .. " nouns with unknown or uncertain plurals")
		if #plurals > 0 then
			error("Can't specify explicit plurals along with '?' for unknown/uncertain plural")
		end
		return
	end
	-- Plural is not attested
	if plurals[1] == "!" then
		remove(plurals, 1)  -- Remove the "!"
		insert(data.inflections, {label = "plural not attested"})
		insert(data.categories, langname .. " nouns with unattested plurals")
		if #plurals > 0 then
			error("Can't specify explicit plurals along with '!' for unattested plural")
		end
		return
	end
	-- If no plural was given, maybe add a default one, otherwise (when "-" was given) return.
	if #plurals == 0 then
		if not need_default_plural then
			return
		end
		plurals[1] = escape(add_suffix(pagename, "s.plural", pos))
	end

	-- There are plural forms to show, so show them.
	plurals.label = "plural"
	plurals.accel = {form = "p"}
	local irregular, indeclinable
	for i, pl in ipairs(plurals) do
		local pl_type = type(pl)
		local pl_term = pl_type == "table" and pl.term or pl
		local canon_pl = canonicalize_plural(pl_term, pagename, pos)
		if canon_pl then
			pl_term = canon_pl
			if pl_type == "table" then
				pl.term = pl_term
			else
				plurals[i] = pl_term
			end
		end
		pl_term = get_link_page(pl_term, lang)
		if not (pagename:find(" ") or is_regular_plural(pl_term, pagename)) then
			irregular = true
			if pl_term == pagename then
				indeclinable = true
			end
		end
	end
	if irregular then
		insert(data.categories, langname .. " nouns with irregular plurals")
	end
	if indeclinable then
		insert(data.categories, langname .. " indeclinable nouns")
	end
	
	insert(data.inflections, plurals)
end


-- Return the parameters to be used for nouns and proper nouns. Currently the same.
local function get_noun_params(is_proper)
	return {
		[1] = {list = true, disallow_holes = true},
		["def"] = true,
		["the"] = {alias_of = "def"},
		["pl\1qual"] = {list = true, allow_holes = true},
		-- The following four only used for pluralia tantum (1=p)
		["sg"] = {list = true, disallow_holes = true},
		["sg\1qual"] = {list = true, allow_holes = true},
		["attr"] = {list = true, disallow_holes = true},
		["attr\1qual"] = {list = true, allow_holes = true},
	}
end


pos_functions["nouns"] = {
	params = get_noun_params(false),
	func = do_nouns,
}

pos_functions["proper nouns"] = {
	params = get_noun_params("is proper"),
	func = function(args, data) return do_nouns(args, data, "proper noun") end,
}


local function base_default_verb_forms(verb)
	return escape(add_suffix(verb, "s.verb")), escape(add_suffix(verb, "ing")), escape(add_suffix(verb, "d"))
end


local function default_verb_forms(verb)
	local full_s_form, full_ing_form, full_ed_form = base_default_verb_forms(verb)
	if verb:find(" ") then
		local first, rest = verb:match("^(.-)( .*)$")
		local first_s_form, first_ing_form, first_ed_form = base_default_verb_forms(first)
		return full_s_form, full_ing_form, full_ed_form, first_s_form .. rest, first_ing_form .. rest, first_ed_form .. rest
	else
		return full_s_form, full_ing_form, full_ed_form, nil, nil, nil
	end
end


pos_functions["verbs"] = {
	params = {
		[1] = {list = "pres_3sg", allow_holes = true},
		["pres_3sg_qual"] = {list = "pres_3sg\1_qual", allow_holes = true},
		[2] = {list = "pres_ptc", allow_holes = true},
		["pres_ptc_qual"] = {list = "pres_ptc\1_qual", allow_holes = true},
		[3] = {list = "past", allow_holes = true},
		["past_qual"] = {list = "past\1_qual", allow_holes = true},
		[4] = {list = "past_ptc", allow_holes = true},
		["past_ptc_qual"] = {list = "past_ptc\1_qual", allow_holes = true},
		["noautolinkverb"] = {type = "boolean"},
		},
	func = function(args, data)
		-- Get parameters
		local par1 = args[1][1]
		local par2 = args[2][1]
		local par3 = args[3][1]
		local par4 = args[4][1]

		local pres_3sgs, pres_ptcs, pasts, past_ptcs

		local pagename = data.displayed_pagename

		------------------------------------------- UTILITY FUNCTIONS #2 ------------------------------------------

		-- These functions are used in both in the separate-parameter format and in the override params such as past_ptc2=. 

		local new_default_s, new_default_ing, new_default_ed, split_default_s, split_default_ing, split_default_ed =
			default_verb_forms(pagename)

		local function compute_double_last_cons_stem_of_split_verb(verb, ending)
			local first, rest = verb:match("^(.-)( .*)$")
			if not first then
				error("Verb '" .. verb .. "' must have a space in it to use ++*")
			end
			local last_cons = first:match("([bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ])$")
			if not last_cons then
				error("First word '" .. first .. "' must end in a consonant to use ++*")
			end
			return first .. last_cons .. ending .. rest
		end

		local function check_non_nil_star_form(form)
			if form == nil then
				error("Verb '" .. pagename .. "' must have a space in it to use * or ++*")
			end
			return form
		end

		local function sub_tilde(form)
			if not form then
				return nil
			end
			local retval = form:gsub("~", pagename) -- discard second return value
			return retval
		end

		local function canonicalize_s_form(form)
			if form == "+" then
				return new_default_s
			elseif form == "*" then
				return check_non_nil_star_form(split_default_s)
			elseif form == "++" then
				return compute_plusplus_s_form(pagename, new_default_s)
			elseif form == "++*" then
				if pagename:find("^[^ ]*[sz] ") then
					return compute_double_last_cons_stem_of_split_verb(pagename, "es")
				else
					return check_non_nil_star_form(split_default_s)
				end
			else
				return sub_tilde(form)
			end
		end

		local function canonicalize_ing_form(form)
			if form == "+" then
				return new_default_ing
			elseif form == "*" then
				return check_non_nil_star_form(split_default_ing)
			elseif form == "++" then
				return compute_double_last_cons_stem(pagename) .. "ing"
			elseif form == "++*" then
				return compute_double_last_cons_stem_of_split_verb(pagename, "ing")
			else
				return sub_tilde(form)
			end
		end

		local function canonicalize_ed_form(form)
			if form == "+" then
				return new_default_ed
			elseif form == "*" then
				return check_non_nil_star_form(split_default_ed)
			elseif form == "++" then
				return compute_double_last_cons_stem(pagename) .. "ed"
			elseif form == "++*" then
				return compute_double_last_cons_stem_of_split_verb(pagename, "ed")
			else
				return sub_tilde(form)
			end
		end
		
		local function canonicalize_en_form(form)
			if form == "n" then
				return add_suffix(pagename, "n")
			end
			return canonicalize_ed_form(form)
		end

		--------------------------------- MAIN PARSING/CONJUGATING CODE --------------------------------

		local past_ptcs_given

		if par1 and par1:find("<") then

			-------------------------- ANGLE-BRACKET FORMAT --------------------------

			if par2 or par3 or par4 then
				error("Can't specify 2=, 3= or 4= when 1= contains angle brackets: " .. par1)
			end
			-- In the angle bracket format, we always copy the full past tense specs to the past participle
			-- specs if none of the latter are given, so act as if the past participle is always given.
			-- There is a separate check to see if the past tense and past participle are identical, in any case.
			past_ptcs_given = true

			-- (1) Parse the indicator specs inside of angle brackets.

			local function parse_indicator_spec(angle_bracket_spec)
				local inside = angle_bracket_spec:match("^<(.*)>$")
				assert(inside)
				local segments = iut.parse_balanced_segment_run(inside, "[", "]")
				local comma_separated_groups = iut.split_alternating_runs(segments, ",")
				if #comma_separated_groups > 4 then
					error("Too many comma-separated parts in indicator spec: " .. angle_bracket_spec)
				end

				local function fetch_qualifiers(separated_group)
					local qualifiers
					for j = 2, #separated_group - 1, 2 do
						if separated_group[j + 1] ~= "" then
							error("Extraneous text after bracketed qualifiers: '" .. concat(separated_group) .. "'")
						end
						if not qualifiers then
							qualifiers = {}
						end
						insert(qualifiers, separated_group[j])
					end
					return qualifiers
				end

				local function fetch_specs(comma_separated_group)
					if not comma_separated_group then
						return {{}}
					end
					local specs = {}

					local colon_separated_groups = iut.split_alternating_runs(comma_separated_group, ":")
					for _, colon_separated_group in ipairs(colon_separated_groups) do
						local form = colon_separated_group[1]
						if form == "*" or form == "++*" then
							error("* and ++* not allowed inside of indicator specs: " .. angle_bracket_spec)
						end
						if form == "" then
							form = nil
						end
						insert(specs, {form = form, q = fetch_qualifiers(colon_separated_group)})
					end
					return specs
				end

				local s_specs = fetch_specs(comma_separated_groups[1])
				local ing_specs = fetch_specs(comma_separated_groups[2])
				local ed_specs = fetch_specs(comma_separated_groups[3])
				local en_specs = fetch_specs(comma_separated_groups[4])
				for _, spec in ipairs(s_specs) do
					if spec.form == "++" and #ing_specs == 1 and not ing_specs[1].form and not ing_specs[1].q
						and #ed_specs == 1 and not ed_specs[1].form and not ed_specs[1].q then
						ing_specs[1].form = "++"
						ed_specs[1].form = "++"
						break
					end
				end

				return {
					forms = {},
					s_specs = s_specs,
					ing_specs = ing_specs,
					ed_specs = ed_specs,
					en_specs = en_specs,
				}
			end

			local parse_props = {
				parse_indicator_spec = parse_indicator_spec,
			}
			local alternant_multiword_spec = iut.parse_inflected_text(par1, parse_props)

			-- (2) Check for user-specified brackets; remove any links from the lemma, but remember the original
			--     form so we can use it below in the 'lemma_linked' form.

			-- Check to see if there are brackets in the pre-text or post-text. If so, use the linked lemma (with the
			-- verb autolinked unless noautolinkverb is given). Otherwise, use the default headword algorithm.
			local function check_bracket(val)
				if val:find("%[%[") then
					alternant_multiword_spec.saw_bracket = true
				end
			end
			for _, alternant_or_word_spec in ipairs(alternant_multiword_spec.alternant_or_word_specs) do
				check_bracket(alternant_or_word_spec.before_text)
				if alternant_or_word_spec.alternants then
					for _, multiword_spec in ipairs(alternant_or_word_spec.alternants) do
						for _, word_spec in ipairs(multiword_spec.word_specs) do
							check_bracket(word_spec.before_text)
						end
						check_bracket(multiword_spec.post_text)
					end
				end
			end
			check_bracket(alternant_multiword_spec.post_text)

			iut.map_word_specs(alternant_multiword_spec, function(base)
				if base.lemma == "" then
					base.lemma = pagename
				end
				base.orig_lemma = base.lemma
				base.lemma = remove_links(base.lemma)
				if args.noautolinkverb or base.orig_lemma:find("%[%[") then
					base.linked_lemma = base.orig_lemma
				else
					base.linked_lemma = "[[" .. base.orig_lemma .. "]]"
				end
			end)

			-- (3) Conjugate the verbs according to the indicator specs parsed above.

			local all_verb_slots = {
				lemma = "infinitive",
				lemma_linked = "infinitive",
				s_form = "3|s|pres",
				ing_form = "pres|ptcp",
				ed_form = "past",
				en_form = "past|ptcp",
			}
			local function conjugate_verb(base)
				local def_s_form, def_ing_form, def_ed_form = base_default_verb_forms(base.lemma)

				local function process_specs(slot, specs, default_form, canonicalize_plusplus)
					for _, spec in ipairs(specs) do
						local form = spec.form
						if not form or form == "+" then
							form = default_form
						elseif form == "++" then
							form = canonicalize_plusplus()
						end
						-- If there's a ~ in the form, substitute it with the lemma,
						-- but make sure to first replace % in the lemma with %% so that
						-- it doesn't get interpreted as a capture replace expression.
						if form:find("~") then
							-- Assign to a var because gsub returns multiple values.
							local subbed_lemma = base.lemma:gsub("%%", "%%%%")
							form = form:gsub("~", subbed_lemma)
						end
						-- If the form is -, don't insert any forms, which will result
						-- in there being no overall forms (in fact it will be nil).
						-- We check for that down below and substitute a single "-" as
						-- the form, which in turn gets turned into special labels like
						-- "no present participle".
						if form ~= "-" then
							iut.insert_form(base.forms, slot, {form = form, footnotes = spec.q})
						end
					end
				end

				process_specs("s_form", base.s_specs, def_s_form,
					function() return compute_plusplus_s_form(base.lemma, def_s_form) end)
				process_specs("ing_form", base.ing_specs, def_ing_form,
					function() return compute_double_last_cons_stem(base.lemma) .. "ing" end)
				process_specs("ed_form", base.ed_specs, def_ed_form,
					function() return compute_double_last_cons_stem(base.lemma) .. "ed" end)

				-- If the -en spec is completely missing, substitute the -ed spec in its entirely.
				-- Otherwise, if individual -en forms are missing or use +, we will substitute the
				-- default -ed form, as with the -ed spec.
				local en_specs = base.en_specs
				if #en_specs == 1 and not en_specs[1].form and not en_specs[1].q then
					en_specs = base.ed_specs
				end

				process_specs("en_form", en_specs, def_ed_form,
					function() return compute_double_last_cons_stem(base.lemma) .. "ed" end)

				iut.insert_form(base.forms, "lemma", {form = base.lemma})
				-- Add linked version of lemma for use in head=. We write this in a general fashion in case
				-- there are multiple lemma forms (which isn't possible currently at this level, although it's
				-- possible overall using the ((...,...)) notation).
				iut.insert_forms(base.forms, "lemma_linked", iut.map_forms(base.forms.lemma, function(form)
					if form == base.lemma and base.linked_lemma:find("%[%[") then
						return base.linked_lemma
					else
						return form
					end
				end))
			end

			local inflect_props = {
				slot_table = all_verb_slots,
				inflect_word_spec = conjugate_verb,
			}
			iut.inflect_multiword_or_alternant_multiword_spec(alternant_multiword_spec, inflect_props)

			-- (4) Fetch the forms and put the conjugated lemmas in data.heads if not explicitly given.

			local function fetch_forms(slot)
				local forms = alternant_multiword_spec.forms[slot]
				-- See above. This should only occur if the user explicitly used -
				-- for a spec.
				if not forms or #forms == 0 then
					forms = {{form = "-"}}
				end
				return forms
			end

			pres_3sgs = fetch_forms("s_form")
			pres_ptcs = fetch_forms("ing_form")
			pasts = fetch_forms("ed_form")
			past_ptcs = fetch_forms("en_form")
			-- Use the "linked" form of the lemma as the head if no head= explicitly given and the user specified brackets
			-- in one of the lemmas. Otherwise we use the default headword-linking algorithm.
			if #data.user_specified_heads == 0 and alternant_multiword_spec.saw_bracket then
				data.heads = {}
				for _, lemma_obj in ipairs(alternant_multiword_spec.forms.lemma_linked) do
					local quals, refs = iut.convert_footnotes_to_qualifiers_and_references(lemma_obj.footnotes)
					insert(data.heads, {term = lemma_obj.form, q = quals, refs = refs})
				end
			end
		else
			-------------------------- SEPARATE-PARAM FORMAT --------------------------

			local pres_3sg, pres_ptc, past

			if par1 and not (par2 or par3 or par4) then
				-- Use of a single parameter other than "++", "*" or "++*" is now the "legacy" format,
				-- and no longer supported.
				if par1 == "es" or par1 == "ies" or par1 == "d" then
					error("Legacy parameter 1=es/ies/d no longer supported, just use 'en-verb' without params")
				elseif par1 == "++" or par1 == "*" or par1 == "++*" then
					pres_3sg = canonicalize_s_form(par1)
					pres_ptc = canonicalize_ing_form(par1)
					past = canonicalize_ed_form(par1)
				else
					error("Legacy parameter 1=STEM no longer supported, just use 'en-verb' without params")
				end
			else
				if par4 then
					track("xxx4")
				elseif par3 then
					track("xxx3")
				elseif par2 then
					track("xxx2")
				end
			end

			if not pres_3sg or not pres_ptc or not past then
				-- Either all three should be set above, or none of them.
				assert(not pres_3sg and not pres_ptc and not past)

				if par1 then
					pres_3sg = canonicalize_s_form(par1)
				else
					pres_3sg = new_default_s
				end

				if par2 then
					pres_ptc = canonicalize_ing_form(par2)
				else
					pres_ptc = new_default_ing
				end

				if par3 then
					past = canonicalize_ed_form(par3)
				else
					past = new_default_ed
				end
			end

			local past_ptc
			if par4 then
				past_ptcs_given = true
				past_ptc = canonicalize_en_form(par4)
			else
				past_ptc = past
			end

			pres_3sgs = {{form = pres_3sg}}
			pres_ptcs = {{form = pres_ptc}}
			pasts = {{form = past}}
			past_ptcs = {{form = past_ptc}}
		end

		------------------------------------------- HANDLE OVERRIDES ------------------------------------------

		local function strip_brackets(qualifiers)
			if not qualifiers then
				return nil
			end
			local stripped_qualifiers = {}
			for _, qualifier in ipairs(qualifiers) do
				local stripped_qualifier = qualifier:match("^%[(.*)%]$")
				if not stripped_qualifier then
					error("Internal error: Qualifier should be surrounded by brackets at this stage: " .. qualifier)
				end
				insert(stripped_qualifiers, stripped_qualifier)
			end
			return stripped_qualifiers
		end

		local function collect_forms(label, accel_form, defaults, overrides, override_qualifiers, canonicalize)
			if defaults[1].form == "-" then
				return {label = "no " .. label}
			else
				local into_table = {label = label, accel = {form = accel_form}}
				local maxindex = math.max(#defaults, overrides.maxindex)
				local qualifiers = override_qualifiers[1] and {override_qualifiers[1]} or strip_brackets(defaults[1].footnotes)
				insert(into_table, {term = defaults[1].form, q = qualifiers})

				-- Present 3rd singular
				for i = 2, maxindex do
					local override_form = canonicalize(overrides[i])

					if override_form then
						-- If there is an override such as past_ptc2=..., only use the qualifier specified
						-- using an override (past_ptc2_qual=...), if any; it doesn't make sense to combine
						-- an override form with a qualifier specified inside of angle brackets.
						insert(into_table, {term = override_form, q = {override_qualifiers[i]}})
					elseif defaults[i] then
						-- If the form comes from inside angle brackets, allow any override qualifier
						-- (past_ptc2_qual=...) to override any qualifier specified inside of angle brackets.
						-- FIXME: Maybe we should throw an error here if both exist.
						local qualifiers = override_qualifiers[i] and {override_qualifiers[i]} or strip_brackets(defaults[i].footnotes)
						insert(into_table, {term = defaults[i].form, q = qualifiers})
					end
				end

				return into_table
			end
		end

		local pres_3sg_infls = collect_forms("third-person singular simple present", "s-verb-form",
			pres_3sgs, args[1], args.pres_3sg_qual, canonicalize_s_form)
		local pres_ptc_infls = collect_forms("present participle", "ing-form",
			pres_ptcs, args[2], args.pres_ptc_qual, canonicalize_ing_form)
		local past_infls = collect_forms("simple past", "spast",
			pasts, args[3], args.past_qual, canonicalize_ed_form)
		local past_ptc_infls = collect_forms("past participle", "past|part",
			past_ptcs, args[4], args.past_ptc_qual, canonicalize_en_form)

		-- Are the past forms identical to the past participle forms? If so, we use a single
		-- combined "simple past and past participle" label on the past tense forms.
		-- We check for two conditions: Either no past participle forms were given at all, or
		-- they were given but are identical in every way (all forms and qualifiers) to the past
		-- tense forms. The former "no explicit past participle forms" check is important in the
		-- "separate-parameter" format; if past tense overrides are given and no past participle
		-- forms given, the past tense overrides should apply to the past participle as well.
		-- In the angle-bracket format, it's expected that all forms and qualifiers are specified
		-- using that format, and we explicitly copy past tense forms and qualifiers to past
		-- participle ones if the latter are omitted, so we disable to "no explicit past participle
		-- forms" check.
		if args[4].maxindex > 0 or args.past_ptc_qual.maxindex > 0 then
			past_ptcs_given = true
		end

		local identical = true

		-- For the past and past participle to be identical, there must be
		-- the same number of inflections, and each inflection must match
		-- in term and qualifiers.
		if #past_infls ~= #past_ptc_infls then
			identical = false
		else
			for key, val in ipairs(past_infls) do
				if past_ptc_infls[key].term ~= val.term then
					identical = false
					break
				else
					local quals1 = past_ptc_infls[key].q
					local quals2 = val.q
					if (not not quals1) ~= (not not quals2) then
						-- one is nil, the other is not
						identical = false
					elseif quals1 and quals2 then
						-- qualifiers present in both; each qualifier must match
						if #quals1 ~= #quals2 then
							identical = false
						else
							for k, v in ipairs(quals1) do
								if v ~= quals2[k] then
									identical = false
									break
								end
							end
						end
					end
					if not identical then
						break
					end
				end
			end
		end

		-- Insert the forms
		insert(data.inflections, pres_3sg_infls)
		insert(data.inflections, pres_ptc_infls)

		if not past_ptcs_given or identical then
			if past_ptcs[1].form == "-" then
				past_infls.label = "no simple past or past participle"
			else
				past_infls.label = "simple past and past participle"
				past_infls.accel = {form = "ed-form"}
			end
			insert(data.inflections, past_infls)
		else
			insert(data.inflections, past_infls)
			insert(data.inflections, past_ptc_infls)
		end

		if pagename:find(" ") then
			-- Check for placeholder "it"
			local words = split(pagename, " ")
			for _, word in ipairs(words) do
				if word == "it" or word == "its" or word == "it's" then
					insert(data.categories, langname .. ' terms with placeholder "it"')
					break
				end
			end

			-- Check for phrasal verbs
			local phrasal_adverbs = list_to_set{
				-- NOTE: This should only contain common phrasal adverbs, not random words like [[low]],
				-- [[adrift]], etc.
				"aback",
				"about",
				"above",
				"across",
				"after",
				"against",
				"ahead",
				"along",
				"apart",
				"around",
				"as",
				"aside",
				"at",
				"away",
				"back",
				"before",
				"behind",
				"below",
				"between",
				"beyond",
				"by",
				"down",
				"for",
				"forth",
				"from",
				"in",
				"into",
				"of",
				"off",
				"on",
				"onto",
				"out",
				"over",
				"past",
				"round",
				"through",
				"to",
				"together",
				"towards",
				"under",
				"up",
				"upon",
				"with",
				"without",
			}
			local allowed_non_adverb_words = list_to_set{
				"it",
				"one",
				"oneself",
				"someone",
			}
			local base = pagename
			local seen_adverbs = {}
			-- Only consider a verb to be phrasal if it consists of a single base verb followed exclusively by either
			-- adverbs from `phrasal_adverbs` or placeholder words from `allowed_non_adverb_words`, where at
			-- least one following word is from `phrasal_adverbs` (hence [[can it]] is not a phrasal verb).
			while true do
				local prev, word = base:match("^(.+) (.-)$")
				if not prev then
					break
				end
				if phrasal_adverbs[word] then
					insert(seen_adverbs, word)
				elseif allowed_non_adverb_words[word] then
					-- do nothing
				else
					break
				end
				base = prev
			end
			if not base:find(" ") and #seen_adverbs > 0 then
				insert(data.categories, langname .. " phrasal verbs")
				for i = #seen_adverbs, 1, -1 do
					insert(data.categories, langname .. ' phrasal verbs formed with "' .. seen_adverbs[i] ..
						'"')
				end
			end
		end
	end
}

return export