Jump to content

User talk:Sean.hoyland

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Khirbet 'Ein Karzaliyah (Arabic: خربة عين كرزلية), Jordan Valley: December 2013 - January 2014
Id'eis (Arabic: ادعيس), Jordan Valley: May 2014

Historic cu data

Hi - I've come here because that particular SPI probably isn't the best place to discuss general stuff about cu data - better to keep the archive uncluttered.

The short answer is that there is no way for non CUs to tell how much historic data is available, if any. Even administrators and SPI clerks can't see it - you need the CU flag in order to have any access to the places where it's visible. I won't go into too much detail about the types of info that are available, but in broad terms there is almost always some information available about accounts which have been checked in the past.

If you have suspicions about an account, I'd urge you not to factor whether the old accounts are likely to be stale into your decision about whether or not to report - if you have behavioural evidence, report it. We would need that evidence anyway to justify a check if the data is available, and if it's not, behavioural evidence can be strong enough to block an account without the need for a cu hit. Hope that's helpful. Girth Summit (blether) 13:48, 8 November 2024 (UTC)[reply]

Thanks, yes, that's very helpful. Sean.hoyland (talk) 14:27, 8 November 2024 (UTC)[reply]
Can we make Sean an admin just so he can better explore CU stuff? BilledMammal (talk) 14:58, 8 November 2024 (UTC)[reply]
@BilledMammal: - Sean can request adminship in the usual way (or I guess I should say one of the usual ways, now that we're in the era of admin elections), if he's interested, of course. As I said though, admins can't see any of this stuff either unless they have the CU bit. The fastest way for anyone to get that just now would probably be to get elected onto Arbcom, the candidates list is rather short at the moment...
@Sean.hoyland: - as an afterthought, I'd like to add that the possibility of them running multiple accounts in parallel did occur to me. I always check for them, but I looked more carefully than I might otherwise have done, in light of the previous cases. All I can say is that some of their editing (but not the majority) comes from a shared IP address, and there are a few other accounts on that IP, any of which might be them, but based on a combination of technical and behavioural observations, I think that unlikely. Certainly, none of them are interested in any of the same subject matter, none of them get involved in discussions or articles that the others are involved in, and it looks for all the world to me like they're all innocently using an institutional internet connection that multiple people have access to. Most of their editing is coming from private IPs, which do not have any other traffic on them. Now, there's no way that CU could detect someone using multiple accounts if they are careful to use different devices and internet connections for each one; all I can say is that if they're doing that, they're being a lot more careful about it now than they have been in the past. Girth Summit (blether) 15:27, 8 November 2024 (UTC)[reply]
The way I look at it is that if I'm willing to waive anonymity, I should have access to the private information currently redacted from the databases for the other 48 million accounts. There might be a flaw in this logic, but I'm just not seeing it. Thanks for the extra details, interesting. Sean.hoyland (talk) 15:46, 8 November 2024 (UTC)[reply]
Me too! Unfortunately, it's not unfettered access. Every time I run a check on an account, or an IP, that action is permanently logged, and other CUs can see what I'm up to. They even audit my activity (the cheek!) If I run inappropriate checks, some pesky ombud or arb will come along and take my fancy permissions away. It's so unreasonable! Girth Summit (blether) 22:45, 8 November 2024 (UTC)[reply]

Looking for a tool

..that does the same as

but also gives you the number of pages you have edited in common. Does that exist? (Just looking for an easy way to find how many pages Icewhiz and I have in common ) Huldra (talk) 23:18, 16 November 2024 (UTC)[reply]

Not that I'm aware of, and I'm not sure of the accuracy of the edit counts produced by editorinteract.py. For example, look at you vs Galamore for 1929 Hebron massacre. It says you=2, Galamore=1. But you made 5 edits to that article, and it should know that because the link to sigma timeline.py says so, as does the sigma usersearch.py.
Anyway, if you are just after something quick to see the number of pages in common you can do something like below.
select 
convert(replace(p.page_title,'_',' ') using utf8mb4) page_title, 
p.page_namespace, 
p.page_is_redirect,
p2.rev_count 'Huldra+Icewhiz rev_count'
from page p
join (
select ru.rev_page, count(ru.rev_id) as rev_count
from revision_userindex ru
join actor_revision ar on ar.actor_id = ru.rev_actor
where ar.actor_name in ('Huldra', 'Icewhiz')
group by ru.rev_page
having count(distinct ar.actor_id)=2
) p2
on p2.rev_page = p.page_id
order by 1,2,3
You can use Quarry - see here -> Resultset (598 rows) Sean.hoyland (talk) 05:50, 17 November 2024 (UTC)[reply]
To editor Huldra: Enter "enwiki_p" into the little box on the left above the table, copy the code into the black area, hit "Submit Query". I'm obviously more evil than you because I get 639 rows in common with Icewhiz. But you and I have 2719 in common. Zerotalk 12:49, 17 November 2024 (UTC)[reply]
Strictly speaking, for intersections with Icewhiz, you would need to combine the results for 48 accounts, but I keep putting off thinking about article intersection information as it's another rabbit hole. It would be nice to be able to quantify the likelihood of intersections.
  • ['007Леони́д', '11Fox11', 'AnnieGrannyBunny', 'Astral Leap', 'AstuteRed', 'Bob not snob', 'DoraExp', 'Double barrel pistol with both opposite direction', 'EnfantDeLaVille', 'Eostrix', 'Free1Soul', 'Galamore', 'Geshem Bracha', 'Herpetogenesis', 'Hippeus', 'I dream of Maple', 'Icewhiz', 'Jacinda01', 'JoeZ451', 'Just Prancing', 'KasiaNhersL', 'LeftDreams', 'ManoelWild', 'Minden500', 'Molave Quinta', 'Mrboondocks', 'Mvqr', 'O.maximov', 'OdNahlawi', 'PeleYoetz', 'Pikavoom', 'PRL Dreams', 'Proud Indian Arnab', 'Purski', 'RCatesby', 'SCNBAH', 'Seggallion', 'Semper honestus', 'Smoking Ethel', 'SunSun753457', 'Świst lodu', 'Szymon Frank', 'The 2nd coming of Purski', 'UnspokenPassion', 'Uppagus', 'VikingDrummer', 'WhizICE', 'Терпение не ненавижу']
Sean.hoyland (talk) 13:10, 17 November 2024 (UTC)[reply]
ok, thanks, Sean, much appreciated. Obviously, both Zero and I have been conspiring with Icewhiz offline.
However, I am not sure I trust those numbers either; why are some article mentioned once, and others twice? Eg, 1917 Jaffa deportation is mentioned twice, while 1929 Hebron massacre is mentioned once? cheers, Huldra (talk) 20:41, 17 November 2024 (UTC)[reply]
@Huldra: It's because different name spaces are listed separately (see the page_namespace column). 0=article, 1=talk, 2=user, 3=user talk, 4=WP, 5=WT, 10=template, 11=template talk, etc.. So the counts are of "pages" not of "articles". Zerotalk 00:08, 18 November 2024 (UTC)[reply]
Huldra+11Fox11=153, Huldra+Galamore=98, Huldra+Geshem Bracha=100, Huldra+PeleYoetz=88, Huldra+Pikavoom=325(!), etc., but there are page overlaps so it isn't correct to add these numbers. Sean, how to test this list of socks all at once? Zerotalk 02:13, 18 November 2024 (UTC)[reply]
Huldra, if you prefer the namespace numbers as descriptions you can translate them like this. Zero0000, I wouldn't do that with SQL. It's easier to handle the data using Pandas or Polars. I'll have a look a bit later. It's a question I've been avoiding or letting marinate for months for various reasons e.g. it's not clear to me how to extract useful information, something like the significance of an intersection, from article intersections. For a start, I don't know what the intersection statistics for a large set of random users with various account ages, editcounts, interests etc. looks like, the background stats, so how can I know whether something is significant or not? It seems possible, in principle at least, to write down a function that estimates the improbability of an intersection based on page revision counts, page unique editor counts, editor editcounts, account age, and various other things that I haven't figured out yet. Sean.hoyland (talk) 04:41, 18 November 2024 (UTC)[reply]
Huldra, unfortunately the evidence suggests that the extent of your conspiracy with Icewhiz and their socks is even more concerning, spanning 1438 pages, a number that is apparently 'extraordinarily high'. There are 3 google sheets here.
A. df_page_intersect - lists editcounts for page intersections between you and each of Icewhiz's accounts.
B. df_page_intersect_sum - same as A but with Icewhiz+socks editcounts summed.
C. df_page_intersect_sock_sum_pivot - pivoted version of B
Zero, I'll see if there is a way to do this only using SQL that isn't horrendously ugly. Sean.hoyland (talk) 10:40, 18 November 2024 (UTC)[reply]
1438 pages! I am clutching my pearls, hoping nobody fins out! (I thought of uploading a picture of me, clutching my pearls, but unfortunately my android phone doesn't communicate easily with my Mac) Luckily, those google sheets you linked to is proof of Selfstudier's work with Icewhiz -not mine. cheers, Huldra (talk) 22:08, 18 November 2024 (UTC)[reply]
I was thinking select pages with any of ('Huldra', 'Icewhiz', 'sock', ...), then of those select pages with Huldra and distinct count at least 2. But I didn't get it to work. The purpose is to dispel the idea that such page intersections prove anything other than similar interests, and I agree that an objective measure of how significant a number is is probably difficult or impossible to find. A more direct measure of working together might be to count how many times A restored an edit of B that had been reverted and vice-versa. But that seems tricky as well as hard to calibrate. Zerotalk 11:02, 18 November 2024 (UTC)[reply]
For me, the interest in intersections predates things like the Piratewires nonsense. It stems from two opposites, 1) an SPI case years ago where a highly improbable intersection played a successful role, and 2) an SPI case where a checkuser request was declined because the large number of intersections were regarded as not compelling. The editor had made a lot of edits and many of the pages were high traffic pages despite many improbable overlaps at pages with low pageviews, revision counts etc. Seems like a clue that somewhere in between those 2 outcomes is a better way to extract, integrate and present page intersection evidence. As for the SQL, even getting all of an account's socks is tricky because the category graphs for sockmasters are almost always incomplete e.g. categories only get 39 of the Icewhiz sock accounts. Sean.hoyland (talk) 12:54, 18 November 2024 (UTC)[reply]
How do I find the extent of my conspiracy with Icewhiz and their socks? :) Selfstudier (talk) 10:48, 18 November 2024 (UTC)[reply]
To editor Selfstudier: Good try, we all know you are Icewhiz. Zerotalk 11:04, 18 November 2024 (UTC)[reply]
Well, step one is probably me fixing the bit where I forgot to add usernames to the output so you can tell who is being compared to who (whom?). Sean.hoyland (talk) 11:07, 18 November 2024 (UTC)[reply]
Step two, results for you are in there now. Only 542 pages, an 'incredible abundance' obviously, but not 'extraordinarily high'...probably. Sean.hoyland (talk) 11:27, 18 November 2024 (UTC)[reply]
Some memories there, thanks. Selfstudier (talk) 11:32, 18 November 2024 (UTC)[reply]

<- Zero0000, here are 2 versions of SQL that produce pivoted intersection results.

  • version A cheats by hard-coding in the 48 Icewhiz accounts. It's fast (9s), but I think it is missing some results and I don't know why yet.
  • version B does it properly by actually selecting all of the accounts. It seems to produce a complete set of results but it's slow (650s), even though the sock selection part is okay...searching block comments always takes a while. Something about the structure is causing the server to use an inefficient execution plan I guess. Not sure what to do about that. Sean.hoyland (talk) 18:19, 18 November 2024 (UTC)[reply]
I am reading the SQL-article, in order to understand what you guys are doing. Unfortunately, it is looks like a typical wikipedia science article: difficult for anyone not knowing what it is about. (But presumably quite clear to those who already know SQL;/), oh well, Huldra (talk) 22:16, 18 November 2024 (UTC)[reply]
SQL was designed by an evil demon. Incidentally Piotrus gets a score of 1942, more evil than any of us. Zerotalk 01:56, 19 November 2024 (UTC)[reply]
This is where all that apparently useless stuff at school about Venn diagrams and set symbols finally has a chance to be useful. SQL just builds sets of things then connects them together like making your own Lego bricks and building something. But this is one area where the LLMs like Claude and ChatGPT really shine because they have looked at millions of lines of SQL. I almost never document any code I write in any language because I'm not a software engineer. It's boring. And I have 'eternal sunshine of the spotless mind' when it comes to code I write. One week is enough time for me to completely forget almost everything about a piece of code and think 'who wrote this garbage?'. Now you can outsource the explaining/documentation to AIs. They're very good at it, much better than their ability to write code. Try putting the SQL in Claude for example and asking it to explain it. Sean.hoyland (talk) 05:13, 19 November 2024 (UTC)[reply]

Zero0000, Huldra, I've put a generic-ish query in Quarry.

  • version C - you can just specify a reference account name and a comma separated list of one or more other accounts to compare it with.

Sean.hoyland (talk) 16:02, 19 November 2024 (UTC)[reply]

Sean.hoyland: much appreciated, Huldra (talk) 21:06, 19 November 2024 (UTC)[reply]

The slow performance of the server for a query where you only need to specify the sockmaster rather than tediously hardcode the sock list was too annoying for me to let go. I've added another version that seems to persuade the server to use a decent plan e.g. 120s vs 650s.

Sean.hoyland (talk) 11:23, 20 November 2024 (UTC)[reply]

Putting some stuff here. What do you think of my way to guess at the number of reverts that are not ECR reverts? Another thing: the number of revisions marked "mw-reverted" is significantly higher than the number marked "mw-undo or mw-rollback". This suggests that the way reverts are detected has more cases. Do you know where it is described? Zerotalk 12:14, 20 November 2024 (UTC)[reply]

I'll have a look and get back to you. But for the second question, I noticed a manual revert tag in my watchlist for the first time the other day. I assume it's ctd_name='mw-manual-revert', ctd_id=582. Special:Tags says it's "Edits that manually restore the page source to an exact previous state". 4 million revisions have been tagged so I guess that tag might account for some of the cases. Sean.hoyland (talk) 12:36, 20 November 2024 (UTC)[reply]
The not-ECR guess seems like it might be error prone. The PIA 0 and 1 namespaces probably have quite different protection statistics, especially high traffic pages (I haven't checked). That might mess up the assumption. It's possible to search for ECR related strings in the revision comments but it is likely to be slow and incomplete. It might be worth trying. I see what you mean about the tag stats. Not sure what is going on there. There might be an explanation buried somewhere if you start at Wikipedia:Tags. There's also 'app-undo' = Undo actions made from the mobile apps. But there are only 26 revisions with that tag in PIA (0,1) namespaces between 2020-10-07 and 2024-10-06. Sean.hoyland (talk) 17:40, 20 November 2024 (UTC)[reply]
Zero0000, the counts should probably exclude bot revisions shouldn't they? If you want to do that you can add something like the following to the where clause or the join. It's tempting to write 'x not in (...)', but it wouldn't work in this case because the actor_user can be null (for IPs) so a 'not in' would miss the unregistered/non-user actors and their revisions.
actor_revision.actor_user in (
 	select ug.ug_user
 	from user_groups ug
 	where ug_group = 'bot'
 ) is not true
For 2020-10-07 to 2021-10-06 that would change the edit count from 96687 to 84775 for example.
As for the mysterious tagging, looking at one page might help e.g. Israeli invasion of the Gaza Strip (disambiguation). The bot vs human back-and-forth in recent revisions is interesting e.g. this bot edit wasn't tagged, even though it looks like it should be treated as a revert.
  • 2024-06-09T10:02:38 Bot1058 talk contribs m 272 bytes (−296)
Sean.hoyland (talk) 07:14, 21 November 2024 (UTC)[reply]
Adding mw-manual-revert to the definition of "revert" does bump up the revert count for recent years. Still not up to the "reverted" count, though. I'm particularly interested in splitting the "reverted" count according to whether the actor was extended-confirmed at the time. Zerotalk 11:20, 21 November 2024 (UTC)[reply]
Just a note: editors who have been here long/have edited many different pages will of course have a greater interaction than "newbies". Which will explain that Zero and I are apparently Siamese twins, joined at the hip. To the extent these interaction numbers can have any meaning at all, they must be adjusted somehow against the number of pages the editor(s) have edited. Any thoughts about how such an adjustment could be made? Huldra (talk) 21:21, 20 November 2024 (UTC)[reply]
I have a pile of disordered thoughts about those kinds of adjustments that have accumulated over the past months, none of which are helpful. To do it properly seems quite complicated. You can imagine that the likelihood of a page intersection between 2 actors is proportional to various numbers and inversely proportional to various other numbers, but those numbers, and there are probably many different factors related to the page, the editors, all other editors, the state of Wikipedia, change over time. Sean.hoyland (talk) 05:08, 21 November 2024 (UTC)[reply]
Right. To define a statistically meaningful measure you first need a model of independent editing. I have thought about it but it is quite a tough problem. Finally, it is impossible for raw counts to distinguish between friends and enemies. To make the distinction one would need to take into account things like how often they reverted each other. Zerotalk 11:20, 21 November 2024 (UTC)[reply]

ArbCom 2024 Elections voter message

Hello! Voting in the 2024 Arbitration Committee elections is now open until 23:59 (UTC) on Monday, 2 December 2024. All eligible users are allowed to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2024 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{NoACEMM}} to your user talk page. MediaWiki message delivery (talk) 00:15, 19 November 2024 (UTC)[reply]