User talk:Julian Jarosch (digicademy)

Jump to navigation Jump to search

About this board

Previous discussion was archived at User talk:Julian Jarosch (digicademy)/Archive 1 on 2020-02-07.

Jura1 (talkcontribs)
Julian Jarosch (digicademy) (talkcontribs)

Yes, I also saw the update via twitter! I manually updated the catalogue on Monday one week ago, it worked well. (Though I prematurely selected “update weekly”, so now every Monday an update job runs with static data.)

I’m also interested in getting a fully automated update job to run, but I have to talk to a colleague about the changes needed for that. I’m sure we’ll get around to it, but not with highest priority. In the meantime, I’m sure I’ll run manual updates at least occasionally.

Julian Jarosch (digicademy) (talkcontribs)

(I just did a manual update again. The recurring job, which wasn’t ready yet, is deactivated again. I also purged the automatches, because none of them were with instance of (P31) family name (Q101352).)

Jura1 (talkcontribs)

I had tried to set up an automated imported at https://mix-n-match.toolforge.org/#/import based on the urls provided on the property talk page, but the tool didn't seem to find column headers.

BTW, maybe you want to include a column "P282" with the value "Q8229" in the file.

Jura1 (talkcontribs)

I will try to create some of the remaining ones. I noticed some entries I had created previously, but someone tried to correct my spelling ;)

Also, some entries in MxM don't have a type defined, maybe this lead to mismatches.

BTW, default description in English is "family name" (all lowercase).

Julian Jarosch (digicademy) (talkcontribs)

Thank you for the item creations! I don’t keep a record of exact numbers, but that must have been around 6000 items?

Thanks also for your suggestions for the import file. Yes, the CSV table we provide currently has no header; we used the format as it was specified by Magnus in 2019. Adding P282 seems a good idea. Regarding the description, I faintly recall that having an English description is a mismatch to the catalogue main language, which is German. (I just never got to following up on that until now.) In my view, having German as the main language of the M’n’M catalogue seems appropriate – then, the description probably should be »Familienname«.

Also, good catch regarding the missing types. I hadn’t noticed that. The type is missing on ca. 9000 items, which is the number added last week with the new import tool. I’ll try to set the type (again?) in the catalogue editor, or if necessary for each item individually in the import.

I mean to test the changes in the import table first with manual uploads. The next chance to test new data will be March 1. Once the details are settled, I’ll look into if/when my colleague can get around to changing the format permanently.

Jura1 (talkcontribs)

I tried to run an update on the existing MxM catalogue, but it didn't seem to work. Maybe it's because I didn't create it or because there is some bug. For existing names, now that most are matched, it doesn't matter that much. It could be interesting to have new ones automatically loaded into MxM (when the format and the import function work), but the current approach gives a similar result.

"Chart by item creation date" on the property talk page shows when the items were created. Tricky ones are still where we have some existing incomplete or incorrectly merged one.

At Wikidata:WikiProject_Names/reports/given_names/Italian, I set up a few checks for given names. I can try to adapt more as complex constraints for the property.

I also added the newly items to a few persons as family name (P734) values. This chart shows when the items for these people were created.

Julian Jarosch (digicademy) (talkcontribs)

Yes, automating the M’n’M import fully is what I’m also keenly interested in. The DFD is set to continue publishing around 400 new articles every two weeks for the next 14 years. Having each publication automagically in M’n’M would be nice. I’ll certainly try to make progress on this when time allows :-)

Jura1 (talkcontribs)

At #4200, I did another attempt with MxM. With correctly defined columns, import seems to work ;) Feel free to experiment with the catalogue.

Tabbed import file was:

id	name	description	type	P282	note
1009673	Melnykov	family name	Q101352	Q8229	actual data from MxM
10103	Hussein	family name	Q101352	Q8229	actual data from MxM
1021719	Melnyczenko	family name	Q101352	Q8229	actual data from MxM
103306	Tay	family name	Q101352	Q8229	actual data from MxM
104370	Lichteblau	family name	Q101352	Q8229	actual data from MxM
106519	Gollob	family name	Q101352	Q8229	actual data from MxM
9999999	Testing	family name	Q101352	Q8229	fake name without id
9999998	Aaker	family name	Q101352	Q8229	existing item without id
1	Müller	family name	Q101352	Q8229	sample from property
1095930	von Poppen	family name	Q101352	Q8229	sample from property
  • type and writing system was correctly imported.
  • auxiliary matching still gave suboptimal results, "Testing" shouldn't have matched "Test"
  • At https://mix-n-match.toolforge.org/#/jobs/4200 there is a "taxon matcher". Ideally we would have something similar based on native label (P1705) or exact labels.
  • In a second step, I added native label (P1705) as additional column, but that didn't seem to help. Sample value: mul:"Testing"

Also, I finally managed updating https://mix-n-match.toolforge.org/#/catalog/2844 adding "writing system" to entries. Somehow I couldn't update the type. Maybe simply using P31 instead could help. Accordingly automatches at https://mix-n-match.toolforge.org/#/list/2844/auto still match people or disambiguation pages.

Jura1 (talkcontribs)
Julian Jarosch (digicademy) (talkcontribs)

Thanks for your investigations! I followed your examples for a test run on catalog 4200 with the newest entries, which looked good. Therefore I updated the main catalog 2844 with the current complete list. I skipped the native label column for now, as per your caveat.

There’s currently no progress on our part to change the lists we provide to the new format and fully automate the update. However, now that I’ve tried it out and it works, I think I could do the update manually from time to time — as a temporary stopgap. I hope we’ll get around to this eventually.

Jura1 (talkcontribs)
Reply to "Mix'n'match for P6597"
There are no older topics