Shortcuts: WD:PC, WD:CHAT, WD:?

Wikidata:Project chat

From Wikidata
Jump to navigation Jump to search

Wikidata project chat
Place used to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please take a look at the frequently asked questions to see if your question has already been answered.
Please use {{Q}} or {{P}}, the first time you mention an item, or property, respectively.
Requests for deletions can be made here. Merging instructions can be found here.
IRC channel: #wikidata connect
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2019/03.






a query

for deletions

for comment


for permissions


for deletion

and imports




Please block User:UU[edit]

He vandalised many interwikis about Planet X or Planet Beyond Neptune.  – The preceding unsigned comment was added by 2001:2d8:e290:9990::ba48:af02 (talk • contribs) at 16 February 2019‎ (UTC).

Duplicate surname Liu[edit]

There are two items for the surname Liu at Liu (Q804970) and Liu (Q39000092), one supposedly in Chinese and the other in Latin script. They both have quite a few items using them as family name (P734). Q804970 seems to be a bit more popular and has sitelinks to Wikipedias, while Q39000092 is linked to a Commons category. But what are the criteria by which one item or the other is used for a particular person? If I see a pattern, it's that many of the items using Q39000092 are researchers. Ghouston (talk) 10:27, 6 March 2019 (UTC)

@Harmonia Amanda: regarding surnames and @Sic19: regarding researchers. Mahir256 (talk) 17:23, 6 March 2019 (UTC)
I chose to use latin transliterations for Chinese surnames, if there is an existing item, because I often see multiple items for surnames which appear to be family name varients in the original language and I do not know which would be appropriate. For example, here are some of the items for the family name Li: Li (Q686223), Li (Q770891), Li (Q13588410), Li (Q15283218), Li (Q3447118), Li (Q10910874), Li (Q11983876), Li (Q17008106). The reason the Liu (Q39000092) is used on researchers is only because I am trying to improve the sparce items that have been created from ORCIDs. Simon Cobb (User:Sic19 ; talk page) 18:08, 6 March 2019 (UTC)
So if you are taking data from a source in Latin script, you'd use a matching name with Latin script, but taking data from a Chinese source you'd use a name with Chinese characters. This does result in splitting people somewhat arbitrarily between different items depending on where their data was found. But then which item should be used when you have both a Chinese and Latin version of the name, e.g., if they have articles on two Wikipedias, or if there are web pages with different scripts? Ghouston (talk) 22:47, 6 March 2019 (UTC)
The Latin-script name should be used for people with a Latin-script native language. There are quite a few American, French, German, etc. of Chinese descent who genuinely have "Liu" now as a family name (thinking of Alysa Liu (Q55356854) for example). Chinese researchers should definitely not have a Latin-script names, because it's not their names. If we don't know their family names, we don't add it. We certainly don't add one we know to be false! --Harmonia Amanda (talk) 10:50, 7 March 2019 (UTC) Edit: to be more clear Liu (Q39000092) is not an item about "Liu and other family names transliterated as 'Liu'", it's an item for "Liu, the Latin-script name". It should not be used for people whose names we know are transliterated as 'Liu' but is not 'Liu'. Liu (Q804970) is about the family name 刘, which is transliterated as 'Liu' (among other transliterations). --Harmonia Amanda (talk) 11:02, 7 March 2019 (UTC)
So what should we do when someone with a non-Latin script name has used a transliteration of their name and this is the data we are working with? And how are we suppose to establish whether someone is using a transliteration or whether or not the name we are working with is their genuine or native language name? All of the data I have used to add surnames is in Latin script and a lot of the data is created by the persons represented by the items I am editing thus I used the Latin script items Liu (Q39000092). For example, Yongsheng Liu (Q42834021) has ORCID iD (P496) linking to a profile for Yongsheng Liu, who is based in China and the source of the data in this ORCID. Even when the language setting in ORCID is changed the name remains Yongsheng Liu. If Chinese researchers definately should not have Latin-script names please can you tell me what Yongsheng's family name is? Also, I notice that Liu (Q39000092) has said to be the same as (P460) statements linking to Liu (Q804970) and Liu (Q13391498) - is this correct or not? Simon Cobb (User:Sic19 ; talk page) 17:59, 7 March 2019 (UTC)
There are also bilingual people who will use both a Latin and non-Latin version of their name, depending on which language they are using at the time. Besides that, we don't always know what a person's native language is. Ghouston (talk) 00:14, 8 March 2019 (UTC)
Would we potentially need multiple family name (P734) statements for the surnames a person uses in different languages? The constraints on family name (P734) don't currently include a language qualifier, and it may make it hard for users of the data (templates etc.) Ghouston (talk) 00:24, 8 March 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── My two cents:

  • If the database doesn't include information about the family name in the native language, I think we should keep the Latin transliteration because there can be several different family names that have the same transliterations. As a native speaker, I'm able to find out that Yongsheng Liu (Q42834021)'s family name is indeed Liu (Q804970) instead of Liu (Q13391498), but we cannot make this assumption by default. If we do find the family name in the native language, we can then switch it from the Latin version to the non-Latin version.
  • For the cases like Alysa Liu (Q55356854), the situation is more complicated. Although she was born and grew up in US and should have the Latin version Liu (Q39000092) as her family name, one can definitely argue that Liu (Q804970) is also her family name since it is her father's family name. We may then ask is Liu (Q39000092) also her father's family name? (it's just an example, her father doesn't have an item yet) Although her father was born and grew up in China, he has been lived in US for 30 years and use his western name Arthur Liu in his daily life. I'm not sure what's the best way to deal with these cases.
  • There are also different transliteration systems, for example Liu (Q804970) can be transliterated as Liu, Lau, Lieu, etc. Andy Lau (Q16766)'s family name is Liu (Q804970), but should we also add Lau (Q16871901)? How about Ted Lieu (Q7693450), whose Chinese family name is also Liu (Q804970)?
  • As a side note, I notice that different Chinese family names that have the same transliteration are linked by said to be the same as (P460) or different from (P1889) or both. It doesn't make sense to me that said to be the same as (P460) is used here since they are totally different names that just happen to have the same transliteration. (Actually if we keep the tone marks in the transliterations, they might be different, for example Liu (Q804970) is Liú and Liu (Q13391498) is Liǔ.) I cannot see how the description of said to be the same as (P460) (this item is said to be the same as that item, but the statement is disputed) can be applied here. Using said to be the same as (P460) for linking the non-Latin version and the Latin version could be okay but I'm not sure.--Stevenliuyi (talk) 04:12, 9 March 2019 (UTC)
I seems to me that people who have a non-Latin script name, and also either publish in English (or other Latin script language) or live in a Latin-script country, basically have two names. In the case of living in another country, you'd probably have to supply some kind of translation for certain identity documents so you'd have an official transliteration. When publishing, I'd guess most people would tend to use the same transliteration consistently instead of choosing a different one for each publication. Ghouston (talk) 10:23, 9 March 2019 (UTC)
  • You can add several values in P734. Ideally, I'd start with one matching the native language label, but others are possible. --- Jura 10:16, 10 March 2019 (UTC)
  • Is Hadji (Q56244870) a Latin script or Arabic script name? An Moroccan footballer and a French television presenter are linked to it. Who knows. Ghouston (talk) 10:57, 10 March 2019 (UTC)
  • Not all surname items are currently as developed as they should be. Please add "native label" and "script" properties to the item once determined.--- Jura 11:00, 10 March 2019 (UTC)
I suppose creating multiple given name or surname statements for different versions is the best that can be done when using this Wikidata system, and I can't think of a good alternative system. The difficulty will probably be that a lot of users of Wikidata won't understand these complexities. I created an item Hakeem (Q62029375) for a name of Arabic origin, which has an alternative Hakim (Q19965937), but no Arabic version. I'm not going to create one, since I don't know Arabic script and it's tricky (with letters taking different forms depending on their position in a word). Most likely, people will just link to the Latin versions. Ghouston (talk) 20:40, 12 March 2019 (UTC)

Let's talk about gender[edit]

The property sex or gender (P21) seems like it could use improvement.

The good: The property allows for an impressively diverse range of values including "two-spirit", "transfeminine", "agender", etc.

The bad: because the property only allows you to pick one single value, it is very inflexible. For example, when describing a woman you have to pick between "female" and "transgender female". This ghettoizes transgender people: the question of whether someone is female should be considered seperately from whether they are cisgender or transgender. Also, look at the case of Mauro Cabral Grinspan. Over on Wikipedia, sche pointed out to me that Grinspan identifies as transgender, male, and intersex, and all three are important parts of his identity as an activist and a human being. However, currently, P21 does not allow you to select all three. Instead you must choose between "transgender male" or "intersex". (Currently he's listed as transgender male.)

One obvious solution: allow multiple values. I found a report about cataloging gender that advocates this approach. Here is what that might look like:

Alternatively, instead of allowing multiple values, perhaps we could keep the one value constraint but open things up to qualifiers, to similar effect:


Note that in my view, strong preference should be given to current self-identification (as recorded in reliable sources). For example, a trans woman should not be listed under both "female" and "male" but just "female". The RDA cataloging standard, which I believe is considered pretty authoritative in the library world, has a similar view. It says, under "instructions for recording gender": Gender is the gender with which a person identifies.

Related to the above, my biggest fear with opening up the gender category like this is that editors will just indiscriminately slap both "male" and "female" onto practically any trans or queer person. Perhaps, to avoid this kind of "excessive gendering", it would be better to go with a single value + qualifiers approach.

Anyway, that's my two cents, but note I'm very new to Wikidata don't claim to be an expert on either cataloging or gender. :)

(*) Why didn't you list Shakespeare or Chastain as cisgender? While it would seem fairly reasonable to describe them as cisgender, if someone hasn't identified as such in a reliable source, perhaps it's best to leave it out. (Jeffrey Tambor, meanwhile, has publicly called himself cisgender.)

(**) Why did you list Rebecca Sugar as both nonbinary and female? Because she identifies as a "nonbinary woman". Again, self-identification is king. (...Or queen. Or monarch.)

WanderingWanda (talk) 03:05, 9 March 2019 (UTC)

Note: There's quite a bit of previous discussion at Property talk:P21. Just a thought: if we're going to use self-identification as a qualifier, we should try to use qualifiers that are commonly and consistently used (as we should for any label, really). I myself am a cis-gender male, biologically. I identify simply as male, even though I just said I am cis-gender one sentence ago, and my future identification might change depending on whether I'm speaking on chromosomes or gender studies. The majority of male and female humans in history probably are or were cis-gender. As far as I know, Jeffrey Tambor (Q320204) referred to himself as a cis-gender man exactly once in a speech directly related to him playing a transgender character. Does that mean Tambor "identifies as cisgender male"? Is that enough for a qualifier? I'd argue no, unless he regularly corrects people when they call him an unqualified man. But gender and sexuality are certainly fluid and complex. I'm no expert either, and perhaps the rigid granularity of Wikidata doesn't fully allow for proper record keeping in this field. Animalparty (talk) 05:44, 9 March 2019 (UTC)
Animalparty I understand what you're saying but I think you're coming at this from the wrong angle. I don't think someone's cisness disappears just because they don't talk about it. Same with someone's transness or maleness or femaleness or straightness or gayness or whiteness or blackness. I rarely mention the fact that I'm white, but that doesn't mean I'm not white! It's also worth noting that people in a dominant majority group will naturally talk and think about that group a lot less than people in a marginalized minority group. Anyway, in my view Jeffrey Tambor said he's cisgender, and he's never indicated he's not cisgender, therefore, I think it's perfectly reasonable to label him as cisgender. WanderingWanda (talk) 07:12, 9 March 2019 (UTC)
I'll add that the only reason I used the word "qualifier" is because that's the Wikidata terminology and I was a little uncomfortable using that specific word. (Hmm, perhaps the fact the word "qualifier" is a little uncomfortable in this case is a reason to avoid using qualifiers to solve this.) WanderingWanda (talk) 07:21, 9 March 2019 (UTC)
I think it would be better to have two different properties. One for the sex just defined by the chromosomes and one for gender defined by the person them self. --GPSLeo (talk) 11:10, 9 March 2019 (UTC)
How do you propose we get people's chromosome data, exactly? Is Wikidata going to start commissioning large scale blood tests? :) WanderingWanda (talk) 16:26, 9 March 2019 (UTC)
I would only differ in has a Y-Chromosom or dose not have it. Of course we do not have the data in most cases, but then we just should not imply that we have that information like it is now. --GPSLeo (talk) 17:28, 9 March 2019 (UTC)
Unlike blood type (P1853), we do not actually need blood tests to determine this. Property_talk:P21#Reasonable_inferences. --Yair rand (talk) 18:41, 10 March 2019 (UTC)
I was going to argue seems like the decision to combine sex and gender was made years ago, do we want to relitigate it? I feel like I went "what if we made these tweaks" and you came in with "ah but what if we just throw out what we have and start over?" :) WanderingWanda (talk) 04:36, 11 March 2019 (UTC)
(Was this a reply to my comment, or GPSLeo's? I'm in favor of maintaining one unified property using the current format.) --Yair rand (talk) 06:23, 11 March 2019 (UTC)
I might have misunderstood your position. WanderingWanda (talk) 15:38, 11 March 2019 (UTC)
@GPSLeo: (Via edit conflict) Very problematic. Just for some examples:
In most cases we have no idea who has XYY syndrome (Q267602) rather than simply being male.
Most transgender people see it as insulting to focus equally on their biological gender.
If we really want to model this, there is also presentational gender, which may be multiple for the same person: consider drag performers. And drag performers also can bring up some interesting issues about self-identified gender. Most identify with their biological gender. Some are transgender. Some start by identifying with their biological gender, then become increasingly transgender-identified over time. Many prefer a different set of pronouns depending on their presentation at the moment. - Jmabel (talk) 16:30, 9 March 2019 (UTC)
I admit I don't know much about drag culture, so maybe this is the wrong way to look at it, but: if we wanted to model a drag queen's in-drag gender presentation, my thought is that it should essentially be treated as the gender of a fictional character, like the gender of Harry Potter. WanderingWanda (talk) 16:51, 9 March 2019 (UTC)
The way Wikidata currently treats drag queens seems fine to me, though. As it is now: the label is the performer's drag/stage name, and then the performer's less-well-known non-drag name is listed as an alias, and the gender is listed as whatever the performer currently identifies as outside of drag. So the drag performer Eureka O'Hara is listed with the alias "David Huggard" and the gender of "non-binary" (because the performer identifies as, according to Wikipedia, "genderfluid and gender-neutral.") Meanwhile Alexis Michelle is listed with the alias "Alex Michaels" and the gender "male" because apparently the performer identifies as male outside of drag. WanderingWanda (talk) 17:54, 9 March 2019 (UTC)

Technical question: is it possible to set things so that a statement can have multiple values but certain combinations of values are constrained? For example, could the gender property be set so that you can combine the value “transgender” with the value “male” but there’s either a hard or soft restriction on combining “male” and “female”? WanderingWanda (talk) 01:34, 10 March 2019 (UTC)

I would support allowing the property to have multiple values to cover these cases. An alternative would be to simply expand the number of values, so that besides "transgender female" one could have "intersex female", "cisgender female", "intersex transgender female", etc, but the number of such values would be large (consider that it already allows things like "kathoey" and "two-spirit", and any of these might co-exist with "intersex"), and besides there are the other aforementioned conceptual reasons (the distictness of being trans/cs from being male/female) that simply allowing multiple values would be more sensible. (I would oppose a "chromosomal sex" parameter, since for almost all people, the parameter would be guesswork, and why add a field for something you know going in is going to be unverified and unverifiable guesswork in the vast majority of cases? Perhaps something like "assigned sex", while still very problematic, would at least be somewhat less farcical... but in most cases, we only know someone's gender: whether they present themselves, dress, etc as a man, woman, etc.) -sche (talk) 22:19, 10 March 2019 (UTC)
No, it is actually quite possible to infer it. See the linked section above. --Yair rand (talk) 00:03, 11 March 2019 (UTC)

One more thing: The fact that Caitlyn Jenner and other trans people have a "start time" qualifier added to their gender feels off to me. Gender transition is a lifelong, multistep process and someone coming out as trans is best thought of as a reveal rather than a sudden change. Caitlyn Jenner may have announced that she was a woman on "1 June 2015" but that doesn't mean that's the official start of her womanhood. In fact, if you read the Vanity Fair article that is used as a reference for that "1 June 2015" date, you'll discover that Caitlyn Jenner's family had been discussing her gender for decades before she publicly came out. I recommend we create a new "coming out date" property and use that instead of "start time". WanderingWanda (talk) 23:54, 10 March 2019 (UTC)

See significant event (P793), although I think that would be supplementary data rather than a replacement for start time. Note that start date properties can have variable precision. --Yair rand (talk) 00:03, 11 March 2019 (UTC)
Maybe instead of "coming out date", I'll propose new property "announcement time". That could be used for a lot of different things, including the date when someone came out as trans, and could also, presumably, be made to allow for variable precision. WanderingWanda (talk) 04:15, 11 March 2019 (UTC)
Personally, I think using P793 in more cases has many advantages over creating different properties, although some think otherwise. See Wikidata:Properties_for_deletion#.7B.7BPfD.7CProperty:P606.7D.7D for an ongoing discussion mostly about whether P793 should be used instead of many separate properties. --Yair rand (talk) 06:23, 11 March 2019 (UTC)

Looking into things a bit more, "sex or gender" originally did allow multiple values, and, unfortunately, it resulted in exactly the "excessive gendering" problem I was worried about. At one point Chelsea Manning, for example, was listed as "male" and "female" and "transgender female". Yikes.

The way Wikidata handles gender now is problematic but it apparently is a big improvement on how things used to be. Suddenly I'm much less inclined to try and change things, lest they regress.

Still, I think opening the gender property back up to multiple values could allow us to paint a better, more accurate, and more nuanced picture of who a person is, if we worked towards the goal of reflecting and respecting a person's latest self-identification and shared the basic understanding that trans women are women and trans men are men. WanderingWanda (talk) 16:00, 11 March 2019 (UTC)

sex or gender (P21) is in some languages named now nonly "sex", as it was proposed years ago. Mixing two concepts is not good, there are tools (categorization, grammatical case, inflection etc.) which requires binary input - man/woman/other (not specified, unknown, special). Everybody was born as man or woman (And have this information in birth certificate (Q83900)), but some of them identify himself as something other. So I thing everybody should have Template:P21=man/woman (in special cases (deprecated?) with qualifier at birth) and these special cases can have second statement something other. JAn Dudík (talk) 07:31, 13 March 2019 (UTC)

  • No, actually [:en:Intersex|not everyone was born as man or woman]. For several percent, it's really a matter of gender assignment at birth. - Jmabel (talk) 15:45, 13 March 2019 (UTC)
  • The claim that multiple values aren't allowed on Wikidata for sex or gender (P21) is false. The one value constraint indicates that a property generally has only one value, but doesn't say that it's forbidden to have more then one value. There's nothing standing in the way of adding multiple values and qualifying them with start time (P580), end time (P582), subject has role (P2868) and similar properties to describe the status of the relevant claim. If you do make multiple claims, the constraint warning even goes away when you flag one of them as the "preferred value" (and it makes sense to use the most recent self-identfied value for that).
When it comes to "announcement time" feel free to propose the property with multiple examples from different domains where it would be useful. I don't think significant event (P793) is good as a qualifier for claims about when a claim started being true. significant event (P793) doesn't subclass point in time (P585).ChristianKl❫ 15:11, 14 March 2019 (UTC)
  • I could be wrong but I believe, when I first looked at the property, the one value constraint was set to mandatory. I do see that multiple values are currently allowed, however. WanderingWanda (talk) 03:27, 16 March 2019 (UTC)
In 2014 the constraint was first set to the current wording, because in cases like this it's useful to enter multiple values. When WMDE implemented the features that make constraints more visible they tasked an UX person with writing texts that explain the constraint that had no experience with using Wikidata. That UX person thought that the constraint should be mandatory and wrote corresponding texts. Unfortunately, they didn't check in with the existing Wikidata community about whether we want the constraint to be mandatory. It wasn't clear to the person that it's important to have the flexibility in cases like that. It was up a while in that wording till I changed it back to the original wording. ChristianKl❫ 11:04, 17 March 2019 (UTC)
I don't think setting start and end dates helps (some or all of) the cases we're talking about, unless it's considered sensible and acceptable for multiple Qs to cover the same period and for that period to sometimes or often be the individual's whole life. Indeed, the ability to set start and end dates in this case seems more liable to be misused, if someone wants to pick a moment (of whatever degree of precision) and say "before then this trans woman was a man" (which is ... fraught with problems), than to have a good purpose. I don't know that setting a "preferred" value and other values as less-preferred helps either, in cases like e.g. the aforementioned Mauro Cabral Grinspan, an intersex trans man who is both an intersex and a trans activist and thus might not see one of those as more primary/privileged than the other. The solution seems to be to set multiple values at the same top level of "preferred"-ness, and I'm pleased to see that this appears to be possible. (Now to see if the edits stick...)
The remaining question is, I suppose, whether to continue handling trans women with the "transgender female" label or to prefer using "female"+"transgender", the latter of which I am pleased to see seems theoretically/technically possible since Q189125 exists. :) -sche (talk) 07:12, 18 March 2019 (UTC)
I didn't said anything about setting values as less preferred. You might want to read up what the preferred rank happens to be within Wikidata. There are many cases where a data-consumer does want a truthy value of what someone's gender is. I consider it to be a resonable expectation for data consumers to treat properties that are tagged with 'single value constraint' as providing reasonable truthy values. Using "transgender female" has the advantage that this is a value that can be truthy while the combination of two labels would mean that only of of them would be returned when a data-consumer asks for a truthy value. ChristianKl❫ 12:37, 18 March 2019 (UTC)

Light-on-dark color scheme for wikidata[edit]

Checking Light-on-dark color scheme on the English Wikipedia this is what I want for my interface but on wikidata. If I hit Preferences --> Appearance my skin is Vector. Does anybody have any Custom CSS for me so I can enable night mode/dark theme/dark mode? Even if you recommend a resource on let's say Wikibooks or Wikiversity I'd be happy to read up on anything you might recommend that after studying it might help me learn how to do it. Btqfshfst (talk) 21:52, 9 March 2019 (UTC)

@Btqfshfst: There are a number of Chrome extensions that purport to do this for any website. You might also want to look into Bovlb (talk) 20:26, 11 March 2019 (UTC)
@Bovlb: I did look into the link. User:Dbfyinginfo/vector.css has now been modified and a lot is now in "dark mode"/"dark theme". Not this edit screen which happens to be Dark-on-light but that's ok. I fixed most of my problems and I thank you for that! I can probably experiment more myself to perfect it Dbfyinginfo (talk) 19:30, 14 March 2019 (UTC)


Hi there are many names a remote village in Greece went by throught the years. While I tried to add them all, a user always removes them. These names are not used anymore but there are found in historic sources like goverment gazettes which refered to this village with those names. The village is Dorvitsa (Dorvitsa (Q5299063)). Please take a look at the wikidata's item history and inform me who is wrong. Thank you.(TakisA1 (talk) 23:42, 9 March 2019 (UTC)) t @Chalk19: Xaris333 (talk) 10:54, 10 March 2019 (UTC)

@Chalk19: the summary is strange, for me this is exactly what aliases are here for (that's my interpretation of Help:Aliases anyway). Cheers, VIGNERON (talk) 11:12, 10 March 2019 (UTC)
@VIGNERON: "Obscure" or lost-in-time-and-space variations of place names like this, not in use for centuries, or perhaps with a very limited usage (like in a family circle, if one specific variation appeares just once in a dowry contract of 1774) does not mean "also known as". Variations, say, found in manuscripts layng in dust on the shelves of a local archive, or other similar sporadic versions or even misspellings by scribers or writers who put down a name as they thought it sounded etc. 300 years ago does not mean "also known as". "Also known as" means alternative, or even completely different names as recorded in older school textbooks, old encyclopeadias etc. ——Chalk19 (talk) 22:22, 10 March 2019 (UTC)
PS. Most places in Greece have tenths of variations of their names like those of Dorvitsa, starting from time immemorial: variations found in old manuscripts or some very old books, like with a "v" instead of a "b", or with a "p" in the place of a "b", or a "g" in the place of a "k", an accidental interchange of letters, a "t" replacing "d", or a "th" (Greek δ, sounds like in the) replacing "d", and an "o" instead of "ou", a sometimes missing "o", a "tz" in the place of "ts", just a stressing mark in another syllable etc. etc. etc. Several irregular and incontinuous forms that don't mean that the place is "also known" by all these alternate, sporadic forms, that most of them just appeared somewhere because in those days there was not a "standard" version of the name of the place, because a sriber or an author was changing the name accoding to what was closer to his (probably not "her" in those times) cultural backround (if he were of this or another ethnic origin etc.) Variations as the abovemationed are not "other common names", or "alternative names" to the "most common name […] known by to readers", as requiered in Aliases: Criteria for inclusion and exclusion.——Chalk19 (talk) 22:39, 10 March 2019 (UTC)
PS2. Sometimes variations of this kind may be included (according to the reliable sources available on them) in a summary in a section on older names, or name forms recodered of a place in its article in WP. ——Chalk19 (talk) 22:57, 10 March 2019 (UTC)
@Chalk19: hmm, I agree, too rare variations found in old manuscript are not what alias is intended for (but still can be used to store them) but here, according to TakisA1 these come from « goverment gazettes ». So it seems ~correct to me, and there is no such thing as having too much aliases, the more the better. Anyway, if not in aliases these names can also be stored in properties (name (P2561) or a more specific one) where references can explicitely be added. Cheers, VIGNERON (talk) 08:00, 11 March 2019 (UTC)
PS: « translations and transliterations, should be recorded as aliases » (Help:Aliases so yes a "tz" in the place of "ts" is acceptable per this rule (and it's a good thing as most people don't know that 'tz' and 'ts' are equivalent in Greek).
@VIGNERON: Well, does anybody really think that a Greek village actually has 17 (!!) different, alternative common names? Not even Constantinople, an imperial capital of a 1,600 years doesn't have so many! ("Βυζάντιο", "Νέα Ρώμη", "Κωνσταντινούπολη", "Πόλη", "Βασιλεύουσα", "Ισταμπούλ").
TakisA1 claim that all these "are found in historic sources like goverment gazettes" is a totally misguiding statement. According to his own source (provided in this edit summary), only Δοροβίτσα is from a 1836 goverment gazette, "Δοροβίτσα, (εφημερίδα κυβερνήσεως 1836)". For the rest we get no information, neither wher there have been found, nor (and this is crucial) how widespead was the use of them (if any, in public). Finally, please note that TakisA1 source is not a reliable source; it is just a post in the website of a local recreational club providing something written in 1973 by the amateur historian Sokratis A. Liakos (obviously a reproduction from a book or an article of his, not mentioned in the post). ——Chalk19 (talk) 09:05, 11 March 2019 (UTC)
@Chalk19: why do you think that aliases have to be « alternative common names », it's is just for « alternative names » (no matter how common they are, we do avoid the more rare variations but it doesn't have to be common, look at the examples on Help:Aliases). That said, TakisA1 could you provide good references, it would be useful for name (P2561) (and maybe also for Lexemes). Cheers, VIGNERON (talk) 10:32, 11 March 2019 (UTC)
@VIGNERON: According to Wikitada policy on the matter that I have already quoted above (Criteria for inclusion and exclusion): "The label on a Wikidata entry is the most common name that the entity would be known by to readers. All of the other common names [my emphasis] that an entry might go by, including alternative names; acronyms and abbreviations". So, we clearly talk about "other common names" (my emphasis). Furthermore we "should not include […] spelling mistakes". So, what is the proof that, say, the forms "Τεροβιτζιά" or "Τιροβήτσα" found somewhere (where? in an official document? in a entry by some semi-illiterate book-keeper of the area? on a gravestone?) in 1770 and 1791 respectively (according to TakisA1 source [1]) how widespread actually were? Meaning, were they "common names" or "other common names" of those times? Moreover, couldn't be "Τεροβιτζιά" the "common name" and the similar sounding "Τιροβήτσα" an accidental misspelling of it, or vice versa? And what about "Ντερβιτσά" recorded (according to the same source) by an anonymous French traveller who passed by the early 1800s? Did he put down the name in his notebook in Greek as it appears in the post? Or in French=latin scripture, so what is the original entry, and who had it transliterated to the Greek alphabet? And, then, isn't it highly probable, as were the case with foreigner travellers in those days, that he didn't know Greek, so he misspelled the name of the village? Isn't it possible that, since he was French, he wrote a "D" in the place of the initial Greek letter "Δ", or a possible initial "Τ"? So, why we must include "his" version of the name, "Ντερβιτσά", in owr "Also known as" list of the "other common names"? In other words: has TakisA1 provide any reliable secondary source about all these 17 alternative forms of the village name as its "other common names"? Not, so far. ——Chalk19 (talk) 11:35, 11 March 2019 (UTC)
I found this source [2] which says that François Pouqueville was the french traveler. Although this source at first doesn't seem accurate, the information provided by this article seems well researched. The first source comes from the official site of the union of people from Dorvitsa ( Those versions indeed are not yet used but being a person from this village I can tell you we only use Dorvitsa or Dorvitsia.(TakisA1 (talk) 18:56, 11 March 2019 (UTC))
The "official site of the union of people from Dorvitsa"! What a euphemism for just the website of just a local recreational club! "[S]eems well researched"? hmmm, but is it, or it just seems so (to you perhaps, but not to me). ——Chalk19 (talk) 19:36, 11 March 2019 (UTC)
Here, in this official document of the Hellenic Ministry of Environment and Energy [3] in page 132 is a reference to all the names.(TakisA1 (talk) 15:14, 12 March 2019 (UTC))
You are still trying to misguide us; of course I am fully aware of these tactics of yours from el-WP. This is not any "official document of the Hellenic Ministry of Environment and Energy"; don't try to fool us again as you did before with the "goverment gazettes". This "official document" is just an application to the Ministry by some individuals (engeneers and architects) in order to get funds for a project. Somewhere in there description of the area, they write a few words about its villages, and their history: on Dorvitsa they copy-paste the other "source" of yours, i.e. the post at the wabiste of the local recreational club. This is my last comment to the subject: I am "escaping" of this fallacious discussion that leads nowhere. ——Chalk19 (talk) 07:51, 13 March 2019 (UTC)
I did not try to fool anyone. The official name of ΦΕΚ is Government Gazette. Also this is an official survey for the Ministry, in the minitry's official website making this an official docyment.(TakisA1 (talk) 12:45, 13 March 2019 (UTC))

Merge artistic creation (Q47407603) and artistic creation (Q29586009)?[edit]

I wonder whether artistic creation (Q47407603) and artistic creation (Q29586009) should be merged. - The former refers to the "process during which a work of art comes into being", while the latter is defined as "economic activity involving the creation of artistic works". Any thoughts? --Beat Estermann (talk) 07:54, 11 March 2019 (UTC)

@Valentina.Anitnelav, Andrawaag: who created these artistic items. Multichill (talk) 16:57, 11 March 2019 (UTC)
I'm not sure if I fully understand the scope of artistic creation (Q29586009), but according to the description and a catalog code (P528) for the Statistical Classification of Economic Activities in the European Community (Q732298) artistic creation (Q29586009) represents artistic creation as an economic activity, which is not the case with artistic creation (Q47407603) which also includes notions of artistic creation outside the economical system (e.g. as a self-sufficient activity). I think it is safe to say that artistic creation (Q29586009) is a subclass of artistic creation (Q47407603). - Valentina.Anitnelav (talk)
That is a good question. Although related I would argue that both terms are distinct enough from each other to stay separate, as much as Apple can be refering to a fruit or a computer. artistic creation (Q29586009) is a term from a vocabulary of economic activities use to meassure economic activity with a country. --Andrawaag (talk) 21:57, 11 March 2019 (UTC)
@Valentina.Anitnelav, Andrawaag: Ok, I've followed Valentina's argument and have defined artistic creation (Q29586009) as subclass of artistic creation (Q47407603) (and of "economic activity"). Cheers, --Beat Estermann (talk) 14:04, 18 March 2019 (UTC)

Suggestions based on constraints: next step[edit]

Hello all,

Last year, we enabled suggestions based on constraints values for the constraints section of a property as a beta feature. You can also have a look at the documentation and the list of supported constraint types.

After a few months of testing, we would like to enable it for all users. Before that, we would like to know more about your experience with that feature.

  • Did you encounter issues or unexpected behaviours?
  • Is there anything that should be improved before enabling it for all users?

When reporting an issue, please give specific examples and what you would have expected instead, so we can figure out the best way to solve it. You can also leave a comment in the related ticket.

Thanks! Lea Lacroix (WMDE) (talk) 13:21, 11 March 2019 (UTC)

I still have the feeling that the feature isn't ready for deployment yet. There hasn't been any work on it since it's release as a beta feature. Important things as sorting based on usage is still missing, making the feature useless for me. For example, genders are sorted very weirdly. Also a Phabricator task about the automated results not being linked (aka not possible to open the property or item in a new tab) is still open. Sjoerd de Bruin (talk) 13:34, 11 March 2019 (UTC)
I see radically improved suggestions for “gender” in the last couple of weeks (male and female first, thank you!). All in all, I find the feature really useful - no complaints now that gender is better. - PKM (talk) 23:07, 11 March 2019 (UTC)
Thanks for your replies! @Sjoerddebruin: this is exactly why we're coming to you right now and ask for your feedback :) We will make sure to take this ticket in account. Can you check again the order of genders and let me know if there is still something going wrong? Is there anything else that we should know about and fix? Lea Lacroix (WMDE) (talk) 11:22, 13 March 2019 (UTC)
The order of the genders have improved indeed... If I find more, I'll add it to the task. Sjoerd de Bruin (talk) 11:31, 13 March 2019 (UTC)

Scholarly articles and main subject P921[edit]

I've been busy with disambiguating author strings in scholarly articles and I am wondering how main subject is added to them.

  • Is it done programmatically ?
  • What is the heuristics behind it ?
  • Can it be done by humans ?

For example

RNA-Seq (Q2542347) has been added [4] as the main subject (P921) of

Exonuclease hDIS3L2 specifies an exosome-independent 3'-5' degradation pathway of human cytoplasmic mRNA (Q24294915)

How can one arrive at this decision ?
--Kpjas (talk) 14:16, 11 March 2019 (UTC)

I think its just done by taking keywords from the article title. 18:35, 11 March 2019 (UTC)
It certainly can be done by humans - I do that. I generally scan the first page of the actual article via the DOI, but I am mostly working with articles on costume and textiles. There’s a discussion of approaches in relation to the genewiki project here. Automated keyword scraping needs to watch out for book reviews, where the “main subject” should be the edition of the book and not the subject of the book. - PKM (talk) 23:01, 11 March 2019 (UTC)
It feels to be problematic to me that this statement is added by the QuickStatementBot without any ability to see which user is responsible for it. Having this way of easily adding batch statements is valuable but if the provenence information is lost and in cases like this the responsible user can't be asked, that's problematic. @Lydia_Pintscher_(WMDE): @Magnus Manske: ChristianKl❫ 19:27, 15 March 2019 (UTC)

Peter Smit (Q2247768) Wikipedia disambiguation page[edit]

I have added Peter Smit (politician) (Q62030208), but how do I connect the 3 items ( Peter Smit (politician) (Q62030208), Peter Smit (Q2786858), Peter Smit (Q1991362)) to the wikidata disambiguation page. I suppose wikidata disambiguation pages combine all the underlying Wikipedia pages. Note that Peter Smit (politician) does not have a NL page, but already has a Commons page. Het was an alderman in the The Hague city council, but do I use the past tense? From year x to year y.Smiley.toerist (talk) 10:50, 12 March 2019 (UTC)

PS: I would like to add the picture: File:Viering 150 jaar HTM 07.jpg. How do I do this?Smiley.toerist (talk) 10:53, 12 March 2019 (UTC)

You normally link to disambiguation pages using different from. I hooked up the three items in question for you, have a look. Moebeus (talk) 12:12, 12 March 2019 (UTC)
It is easier to change and add () to the label. Then no disambiguation is needed.Smiley.toerist (talk) 13:01, 12 March 2019 (UTC)
@Smiley.toerist: No, this is wrong. Please read Help:Label. "The label is the most common name that the item would be known by. It does not need to be unique, in that multiple items can have the same label ..." ArthurPSmith (talk) 13:37, 12 March 2019 (UTC)

I have problems completing the alderman functions. He was municipal executive of The Hague from march 2007 to 2014. Before that the 'municipal executive of Westland' from 2004 to 2006. Source vvd website. I have added (Q62067625).Smiley.toerist (talk) 11:21, 15 March 2019 (UTC)

Popular classifieds website Friday-Ad[edit]

I'm looking to add the Friday-Ad to wikidata. It's a very large UK classifieds and commununity website, popular down in the south of England. It's a leading marketplace for motors, pets and all sorts of second hand goods. You can view it here:

Its similar to Gumtree ( and Preloved  – The preceding unsigned comment was added by Miloatfriday (talk • contribs) at 2019-03-12 14:13 (UTC).

How do i send a query that searches for a lexeme with some label?[edit]

I need to do the same as search for an item, but in wqs. Easy as pie (I'm newbie).

example search for the french word 'maison'.

thanks.  – The preceding unsigned comment was added by (talk • contribs) at 19:33, March 12, 2019‎ (UTC).

Have you tried "L:maison" in the "Search Wikidata" box on the top right? Autocomplete doesn't find it, but if you follow the "containing ..." link it is there. If you were looking for ways to do it with our "Query Service", see the list of "useful queries" here: Wikidata:Lexicographical data/Ideas of queries. ArthurPSmith (talk) 20:05, 12 March 2019 (UTC)
And if you really want to a WQS SPARQL query, here it is :
  ?l dct:language wd:Q150 ; #lexemes in French
     wikibase:lemma ?lemma . #with a lemma
  FILTER regex (?lemma, "^maison$"). #this lemma being exactly "maison"
Try it!
Cdlt, VIGNERON (talk) 23:02, 12 March 2019 (UTC)
Thank you very much. Is it that too expensive? Should I search in the box or is it the same thing? for last any idea how can i paste that to wdq? That program is awfully explained.
Hey dont plublish my ip everywhere!
@VIGNERON: much more efficient:
  ?l dct:language wd:Q150;
     wikibase:lemma "maison"@fr.
Try it! --TweetsFactsAndQueries (talk) 08:55, 13 March 2019 (UTC)
@unknownnewbie: when you edit while not logged in, your IP address is public on the page history, whether someone adds {{Unsigned}} or not – there should have been a warning about this when you started editing. If you don’t want that, create an account. (And I think you can contact the Administrators' noticeboard to have your IP address hidden, if necessary.) --TweetsFactsAndQueries (talk) 09:00, 13 March 2019 (UTC)
How do i paste that query here:{SPARQL} , or more precisely how to send a query from bash. I'm using wdq ( ) at the moment, but seems to function at too elementary level.

Diff of off external identifiers and external data[edit]

I ma looking for a tool to get external data that is not already added to Wikidata. I have the list of already created items with the qualifier and the list of the external source. What is the best way to get the IDs from the external source do not have a Wikidata item? May there is a simple command line tool that deletes all numbers that are in two columns so that I only get the ones they are only in one column. --GPSLeo (talk) 20:50, 12 March 2019 (UTC)

One can use the query service for that task, as in this example. You can provide quite a lot of identifiers—5000 should not be a problem—and it lists the ones which are not yet in use in any item in the results set. —MisterSynergy (talk) 21:05, 12 March 2019 (UTC)
Thanks. I tried it with 17.000 identifiers and even this worked. --GPSLeo (talk) 21:26, 12 March 2019 (UTC)

image (P18)[edit]

Hello. Can someone set a request listing files with interior in their title on the talk page for that property? It would be very useful to easily find potential values for image of interior (P5775) statements. Thierry Caro (talk) 00:36, 13 March 2019 (UTC)

@Thierry Caro: Here's a first stab:
SELECT ?item ?itemLabel ?image WHERE {
        SELECT ?item ?image WHERE {
            ?item wdt:P18 ?image .
        } LIMIT 1000000
    FILTER (CONTAINS(LCASE(str(?image)), "interior")) .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
Try it!
The LIMIT is needed because otherwise the query times out: add eg OFFSET 1000000 etc to look at further sets.
Note that at higher Qids this returns quite a lot of paintings containing the word 'interior', attached to the item for the painting -- you might want to filter these out.
A different approach might be to look for Commons categories containing the word "interior" -- I'm slightly reserved about the idea of looking at P18 statements for possible P5775 candidates, because on the one hand I am not sure that duplication of an existing P18 image is necessarily very helpful; while removing an existing P18 statement could if anything be worse, if that is the best representation of the subject. But see what you think. Jheald (talk) 09:47, 13 March 2019 (UTC)
@Jheald: OK. Thank you very much. I'm going to try to improve this. No image of interior (P5775) statement and multiple image (P18) statements would be nice filters to add in order to find even better candidates for a potential move. Thierry Caro (talk) 11:34, 13 March 2019 (UTC)
@Thierry Caro:
SELECT ?item ?itemLabel ?count ?image 

   SELECT ?item ?image WHERE {
       ?item wdt:P18 ?image .
   }  LIMIT 1000000
} AS %images
       SELECT ?item (COUNT(?image) AS ?count) WHERE {
           INCLUDE %images
       } GROUP BY ?item
      HAVING (?count > 1)
    INCLUDE %images .
    FILTER (CONTAINS(LCASE(str(?image)), "interior")) .
    MINUS {?item wdt:P5775 [] }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
Try it! -- Jheald (talk) 13:18, 13 March 2019 (UTC)

See also Wikidata:Request a query. Visite fortuitement prolongée (talk) 19:12, 14 March 2019 (UTC)

Best place to discuss schema?[edit]

I have some potentially-elaborate questions concerning the way we represent things such as:

  • time zones, especially "this entity is in this time zone during part of the year (and this other one during another)"
  • qualified geographic coordinates, such as "coordinates of geographic center" and "coordinates of river mouth"

(But don't answer these yet; I already know the easy answers.)

Is this the right place to discuss things like those, or is there some other forum dedicated to the schema? (And when I say "schema", do I really mean "ontology"?) Scs (talk) 03:26, 13 March 2019 (UTC)

Traditionally, discussions happen on the talk pages for particular properties. I don't think this tradition is that great, from the point of view of sustained documentation, and building on the data modelling work that goes on. You can start a discussion in that way, and ping people who you think would be interested. Or you can just use user talk pages. Or you can try a general forum, or a WikiProject talk. To move things on, you may need a few attempts. Charles Matthews (talk) 11:39, 13 March 2019 (UTC)
We got WikiProject Ontology and Help:modeling was an attempt to centralize or give an entry points to all such discussions. author  TomT0m / talk page 11:50, 13 March 2019 (UTC)
Eventually there should be a consensus that a detailed "manual of style" is really needed. Right now people tend to adopt what they see as good practice on items. That's the wiki way, emergence of norms because they make sense. I'm more interested in biography than geography, but these are areas with millions of items. I think the community here should move ahead from single properties and try to develop a manual. This and referencing are the fundamental issues on content. Charles Matthews (talk) 16:59, 13 March 2019 (UTC)
@Scs, Charles Matthews, TomT0m: Note Wikidata:WikiProject ShEx is working on implementing a type of schema definition language within Wikidata - this project is in testing now, and you probably should try it out and give the developers some feedback! ArthurPSmith (talk) 17:21, 14 March 2019 (UTC)
Mmmm, I rather imagined that "shape expressions" were there to extend constraints for properties. Perhaps I misjudged them. Charles Matthews (talk) 17:57, 14 March 2019 (UTC)
I think we have attempted to use property constraints to define reasonable schemas in the past, but this is a more direct approach. ArthurPSmith (talk) 14:24, 15 March 2019 (UTC)
In the past I have repeatedly asked to explain the reasons behind a schema that is largely opaque. I have often questioned all kinds of arbitrariness and the only result has been silence. Making for stronger restrictions is not acceptable because it means that you insist to move away from the Wiki way.
As long as there is no time taken to explain what in my views is hardly any relevance. I am flat against any increased restrictions. Thanks, GerardM (talk) 16:39, 16 March 2019 (UTC)
Thanks for all those answers. I appreciate what Charles Matthews said about "emergence of norms because they make sense"; I guess I didn't realize wikidata was still that fluid.
For the sake of the discussion, I'll flesh out my (now three) hypothetical questions slightly:
  1. Some entities (e.g. Cambridge (Q49111)) are located in time zone (P421) with a numeric value such as UTC−05:00 (Q5390) (or often two, qualified by valid in period (P1264) standard time (Q1777301) or daylight saving time (Q36669)). Some entities (e.g. Boston (Q100)) link to named timezones like Eastern Time Zone (Q941023). Some entities link to entities for IANA timezones like America/New_York (Q28146035). Some entities (e.g. Italy (Q38)) do more than one of the above. What's the right way?
  2. Some entities use property coordinate location (P625) to associate more than one geographical coordinate. For example, rivers (e.g. Charles River (Q794927)) often have two, qualified by applies to part (P518) river source (Q7376362) and river mouth (Q1233637). But on the other hand, there's property coordinates of geographic center (P5140). Why does it exist? Why not use plain P625 with a qualification, just like river sources and mouths?
  3. Many cities are instance of (P31) city (Q515). But many others are instances of narrower classes, like city of the United States (Q1093829). (And of course city of the United States (Q1093829) is a subclass of (P279) city (Q515).) But should all cities explicitly be plain city (Q515)s (to make it easier for people writing queries for things like "all cities with population greater than 123456")? Or do people writing queries always have to worry about subclasses?
These are the sort of questions that (a) I'd hope one day anyone could answer by reading a nice document describing the wikidata schema, or if not (b) I was wondering where one ought to ask about them. (For now I guess I'll ask them here, albeit in new threads below.) Scs (talk) 17:10, 16 March 2019 (UTC)
On the last one: despite the similarity of names, city (Q515) and city of the United States (Q1093829) are quite different sorts of things. Q515 means a larger urban agglomeration, the usual meaning of "city". Q1093829 means an entity in the U.S. that is officially a "city", which is a form of administrative unit in the U.S. (which differs a bit from state to state). Some of these Q1093829 "cities" are quite small, even populations under 1000, so the are by no means Q515 cities. Conversely, to stick to a U.S. example, city of the United States (Q1093829) has a population of 43,713 but is a village of New York (Q55237813), a New-York-State-specific designation for a different form of administrative unit, even though it dwarfs many Q1093829 cities. - Jmabel (talk) 03:04, 17 March 2019 (UTC)
@Scs: On time zones - all the items you list are instances of (instance of (P31)) time zone (Q12143) either directly or indirectly (via time zone named for a UTC offset (Q17272482) which is a subclass (subclass of (P279)) of time zone (Q12143). The values allowed for a property like located in time zone (P421) are specified in its property constraints - you will notice there is a "value type constraint" that specifies the values must be instances of time zone (Q12143). This is common across Wikidata properties - to the extent we can, we attempt to document the way properties should be used via their constraints. Similarly, the shape expressions I mentioned would be a way to improve documentation of items (within a specific class of items) to show what properties they should have, etc. ArthurPSmith (talk) 14:15, 18 March 2019 (UTC)

February Facto Post/systematic reviews event in Cambridge UK[edit]

w:User:Charles Matthews/Facto Post/Issue 21 – 28 February 2019 for the recent issue of the Facto Post newsletter — just a reminder that it is delivered on enWP, and you can subscribe or unsubscribe by following links in each number. The editorial there is on systematic reviews. As part of the Cambridge Science Festival this year, I'm leading a workshop on systematic reviews and related material to do with evidence-based medicine: official page ScienceSource workshop: how do scientific discoveries become clinical medicine?, and Eventbrite signup page.

The workshop is not an introduction to Wikidata, as such. I'd like to think it is an introduction to a major use case for Wikidata. Abstracting from current practice on systematic reviews and trying to think in structured data terms, one does reach areas around tagging, metadata, MEDRS and so on, quite quickly. All this material is very much adjacent to Source MD and WikiCite.

I'll be posting a more detailed programme to the Eventbrite page in the next few days. Charles Matthews (talk) 11:33, 13 March 2019 (UTC)


Could someone fluent in Russian have a look at Special:Contributions/Алексей_Скрипник, and help this user to understand the difference between data items for works versus data items for editions. I have found at least one instance where he merged a literary work with one of its editions. Some of his other edits seem odd as well, such as removing language of work or name (P407) from publications. --EncycloPetey (talk) 13:58, 13 March 2019 (UTC)

✓ Done. --Ksc~ruwiki (talk) 19:52, 13 March 2019 (UTC)

Query for songs whose title contains a proper name[edit]

My try was this:

SELECT ?compositorLabel ?nacionalidadLabel ?nombreLabel ?cancionLabel WHERE {
  ?cancion wdt:P31 wd:Q7366. #there is some song (obvious)
  ?cancion wdt:P86 ?compositor. #that has some composer
  ?compositor wdt:P27 ?nacionalidad. #this composer has born at these place
  ?nombre wdt:P31 wd:Q1071027.  #this proper name
   FILTER(CONTAINS(LCASE(?cancionLabel), ?nombrelabel)).  #is part of the song title.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

Try it!

But it takes too long. -- 21:27, 13 March 2019 (UTC)

See also Wikidata:Request a query. Visite fortuitement prolongée (talk) 19:12, 14 March 2019 (UTC)

You can search for exact string matches (identical case) quickly, since it's an index lookup. But if you want to search case insensitively, or for a fragment within a string, I think it will always need to scan all potential records, like you are doing with FILTER. This can timeout if you don't have a way to restrict the number of candidates. It can be done in stages using LIMIT and OFFSET. Ghouston (talk) 23:34, 14 March 2019 (UTC)

How to parse queries from bash linux?[edit]

Im trying with wdq ( ), but it seenm too restrictive. Also I tried parsing queries at{SPARQL} , but i have no true idea about how should the sparql be parsed inside that uri. Btw Have you wondered implenting an ssh or telnet service? Thanks -- 21:40, 13 March 2019 (UTC)

  • The output from will be in RDF or JSON, which I imagine wouldn't be much fun to parse in a shell script. You could write something in some other language (say with a JSON library) that outputs simple text for shell, but if you have a decent language, why use shell? Another thing you could do is run the query interactively at, download the results as CSV and process the CSV from the shell. Ghouston (talk) 07:00, 14 March 2019 (UTC)
wdq that you mentioned is the thing that outputs simple text for a shell, but a more flexible version could take a SPARQL query as the parameter, instead of limiting it to specific kinds of queries. Ghouston (talk) 07:50, 14 March 2019 (UTC)

Schedule of Wikidata entity dumps generation - important if you use them![edit]

There is a discussion going on about changing the frequency and schedule with which these dumps are generated, see the phab task. Please weigh in over the next few days if you have a project that uses these dumps and need the schedule to be a certain way. If we hear no objections then mid-next-week we'll start figuring out how to best shuffle the start dates around. Also, if you know others who use these dumps and might not see this message, please poke them. Thanks in advance, -- ArielGlenn (talk) 09:38, 14 March 2019 (UTC)

Nonsense elevation values[edit]

As my bot request was ignored for half year already - will we ever do anything about the bogus elevation above sea level (P2044) values especially for hills and mountains imported from the Cebuano Wikipedia (Q837615)? The way these values were generated has both the inaccuracy of the altitude model, but much more severe the inaccuracies of the coordinates, thus that algorithm only give credible results for relatively flat landscapes, quite the opposite of hills. In the past months, whenever I came across such an item, I added the missing reference, set the status to deprecated and if had time even looked for the correct value. So now the following query (Kudos to @Tagishsimon:) can illustrate how much our data on the height of hills is only for entertainment purposes.

SELECT ?item ?itemLabel ?normal ?deprecated ?diff ?unitLabel WHERE {
  ?item p:P2044 ?statement.
  ?statement psn:P2044 ?statement_psn .
  ?statement_psn wikibase:quantityAmount ?normal .
  ?statement_psn wikibase:quantityUnit ?unit .
  ?statement wikibase:rank wikibase:NormalRank .
  {?statement prov:wasDerivedFrom ?statement0 .
  ?statement0 pr:P143 ?normal_ref . 
  filter (?normal_ref!=wd:Q837615) }
  { filter not exists {  ?statement prov:wasDerivedFrom ?statement0 . } } 
  ?item p:P2044 ?statement1 .
  ?statement1 psn:P2044 ?statement1_psn .
  ?statement1_psn  wikibase:quantityAmount ?deprecated .
  ?statement1 wikibase:rank wikibase:DeprecatedRank .
  ?statement1 prov:wasDerivedFrom ?statement2 .
  ?statement2 pr:P143 wd:Q837615 .
  bind(?normal - ?deprecated as ?diff)
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

Try it!

We must systematically set all the heights imported from ceb to deprecated, and add the missing references! Ahoerstemeier (talk) 11:56, 14 March 2019 (UTC)

I don't know why but I happen to think cebuano-inspired wikielements has frequent issues... But you can filter them easily out Bouzinac (talk) 13:48, 14 March 2019 (UTC)

LinkedIn personal profile URLs[edit]

We still need to convert LinkedIn personal profile URL (P2035) to an external-id datatype; or create a new property, migrate the data, and then delete the old one. See Wikidata:Project chat/Archive/2018/08#LinkedIn personal profile URL. Which would be preferred? Are there any objections? Is anyone prepared to work with me on this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:31, 14 March 2019 (UTC)

I’m in support of making this change one way or another. If we’re going to make it an external ID, then I think it should be named “LinkedIn personal profile ID” and use a formatter URL. That's probably enough changes to justify a new property to replace the existing one, which allows for a “deprecated” period on the existing property and an orderly conversion process before the old property is deleted. - PKM (talk) 20:15, 14 March 2019 (UTC)
Symbol support vote.svg Support propose a new property to replace the old one. ArthurPSmith (talk) 14:22, 15 March 2019 (UTC)
@Pigsonthewing: And yes, I am willing to work on this with you. - PKM (talk) 19:12, 15 March 2019 (UTC)


OK, now at Wikidata:Property proposal/LinkedIn personal profile ID. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:34, 15 March 2019 (UTC)

Can given name and family name additions be somewhat automated?[edit]

Many person items are missing name fields: on Commons, there are currently over 260,000 person categories in Uses of Wikidata Infobox with no family name, and over 60,000 in Uses of Wikidata Infobox with no given name. There are likely many, many more "unnamed" people on Wikidata without a presence on Commons. Since given and family names are pretty fundamental biographical properties, it follows that they should be given some priority. I've been adding names to Wikidata piecemeal, but a semi-automated approach would be more efficient. However, this might be easier said than done, especially for non-Western or non-Latin script names: the Western surname Abraham (Q13367920) appears to have only one form/item here on Commons, while variants of the Japanese family name "Abe" alone include Abe (Q26000282), Abe (Q18645909), Abe (Q24091156), and Abe (Q27156022). Another potential snag is the distinction between multiple given and surnames (e.g. double-barreled surnames): "Jacob Thomas Spencer Smith" may have three given names with one surname, or perhaps two given names, with Spencer a maternal and Smith a paternal surname. Spanish-language names also often involve a matronym and patronym, one of which may be dropped in common usage. I don't claim to have any knowledge of how a semi-automated drive may proceed, but hopefully some wizards can figure it out. Perhaps start with the simplest scenarios first (whatever that may mean)? Animalparty (talk) 18:54, 14 March 2019 (UTC)

We also need to consider that some authorities will include multiple treatments of “double-barreled” surnames, and all of these should be included with references, the versions most used being preferred. That may be hard to automate. - PKM (talk) 20:22, 14 March 2019 (UTC)
IMO we also need to ask what is our preferred best practice for “double-barreled” surnames? I am far from convinced that we should be creating a new item for every such pair, which could be a huge number of additional names, sometimes vanishingly infrequent. What is our current recommended practice on this at present? Jheald (talk) 14:24, 16 March 2019 (UTC)
Not to mention Abe (Q11160829) and Abe (Q56247486). I'd say the latter is the right one to use when adding a Surname "Abe" from a Latin script document, but I'm not sure that the former should ever be used in a Surname statement. Actually, Abe (Q56247486) doesn't have a script specified. It should probably be made the Latin script version, and also used for people like John Abe (Q6218046). Ghouston (talk) 22:04, 15 March 2019 (UTC)
  • yes (if you can determine it fairly reliably) and no (if you can't). Approaches I used for given names worked fairly well. Some people tried it for family names ended up stopping it. If you want to do family name, maybe try people of the same nationality. --- Jura 08:53, 16 March 2019 (UTC)

New data type: musical notation with Lilypond format[edit]

Hello all,

Following a request from community members, we just deployed a new data type called “Musical Notation” in order to store musical notation in Wikidata. Property creators can now find this new data type in the list and create new properties with it.

A property with musical notation data type will display the notation in Lilypond format, using the score extension.

For example, if you enter this code as a value: \relative c' { c d e f | g2 g | a4 a a a | g1 |}, it will be displayed as such:

Screenshot musical notation Wikidata 1.png

The score also appears on the diff pages.

Screenshot musical notation Wikidata 2.png

The existing property LilyPond notation (P5482) is used on around 300 items. If you need any help from the developers to change the datatype of the the property or to migrate the content, please let me know.

One bug is already known and we’re working on fixing it: if the score is long, it gets out of the statement box and overlaps with the edit button.

If you encounter any issue, feel free to create a subtask of this ticket.

Cheers, Lea Lacroix (WMDE) (talk) 20:03, 14 March 2019 (UTC)

Not described (yet?) in Help:Data type. Any list of supported syntax? Can we use a test property on Test Wikidata? LaddΩ chat ;) 23:34, 14 March 2019 (UTC)
There's a test property on test and on beta :) Lea Lacroix (WMDE) (talk) 06:25, 15 March 2019 (UTC)
I introduced that type in Help:Data type but I noticed it is not described in mediawiki:Wikibase/DataModel either. LaddΩ chat ;) 12:13, 15 March 2019 (UTC)
Nice! ArthurPSmith (talk) 14:19, 15 March 2019 (UTC)
I look forward to using this on Q12030. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:35, 15 March 2019 (UTC)
I’ve proposed a sandbox property for this datatype; I’ll leave the discussion about real properties (convert LilyPond notation (P5482)? replace it with one new property? split it into several new properties?) to others :) --Lucas Werkmeister (talk) 00:42, 16 March 2019 (UTC)
This issue is now resolved, it seems. LaddΩ chat ;) 14:27, 17 March 2019 (UTC)

Automated processes and references[edit]

Here’s a question that may lead to an RfC, depending on your responses:

At this stage of Wikidata's evolution, should automated processes/bot jobs that add statements without also adding references for those statements be strongly discouraged?

I would assume there would be exceptions for statements like <instance of>, <subclass of>, and external identifiers, and for self-referencing items like books. Thoughts? - PKM (talk) 20:31, 14 March 2019 (UTC)

Different areas of Wikidata are more developed than others, and different areas are more in need of references than others, or more in need of raw quantity than others. There are many areas in which bots adding unreferenced statements would be less than welcome, and others where bot-added unreferenced (but hopefully at least minimally curated) statements would be a useful addition. --Yair rand (talk) 20:45, 14 March 2019 (UTC)
I do not think we should require references for automated editing, for several reasons:
  • You mention that there will be exceptions, but who defines which ones would be acceptable? I think there will be lots of exceptions and also exceptions from exceptions, which could make it difficult to comply with such a complicated policy.
  • Mandatory references complicate the batch/bot editing process, so new and less tech-savvy users might find it difficult to contribute to Wikidata.
  • There are plenty of worthless references available already now, often from batch jobs where users wanted to do it correctly, but their references are either not well-shaped, or the sources do not support the actual statement. I’d expect reference quality to decay a lot once we made their addition mandatory.
MisterSynergy (talk) 20:50, 14 March 2019 (UTC)
Which particular bots do you think should be discouraged? ChristianKl❫ 14:07, 15 March 2019 (UTC)
I've looked at item maturity in the past. A proto-item is for example a new empty item just created with one sitelink to Wikipedia. On the other end of the scale we have items that are very extensive and very well sourced. For proto-items I'm quite happy to just get some statements on them, with or without references.
These days most automated tools and bots add references, right? Do you have some examples without references? I'm especially interested in items that don't link to Wikipedia. Multichill (talk) 14:42, 15 March 2019 (UTC)
Sometimes applying Help:Edit summary (Q4533519) is enough. --Succu (talk) 21:54, 15 March 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Thanks, all, for the comments. Clearly further action is inappropriate at this time. - PKM (talk) 19:52, 17 March 2019 (UTC)

Spanish speaker needed to check on Q8778498[edit]

España sagrada (Q8778498) is for a book published in 1700s but one of it's authors died in 1972. It seems odd, but sources point to Spanish Wikipedia and I do not speak Spanish. Can someone that does verify? --Jarekt (talk) 03:12, 15 March 2019 (UTC)

  • While the first volumes came out in the 1700s, parts of this were published well into the 19th century, and there have been various later editions, including one in the present century.
  • I presume you are referring to Ángel Custodio Vega (Q6173105). That looks like an error. According to es-wiki, he gave a lecture about about España Sagrada" in June 1950, La "España Sagrada" y los Agustinos en la Real Academia de la Historia; the lecture and some related discussion were published later that year. - Jmabel (talk) 03:29, 15 March 2019 (UTC)
Thank you for looking into it. So we should remove Ángel Custodio Vega (Q6173105) from list of authors. Right? --Jarekt (talk) 18:44, 15 March 2019 (UTC)
@Jarekt: Yes. - Jmabel (talk) 19:53, 15 March 2019 (UTC)

Wikidata - now the most edited wikimedia website[edit]

fwiw, the number of #wikidata edits caught up with the number of en #wikipedia edits at 14:05 on 19 March 2019, at 883,173,630,[5] and thus wikidata is now the most edited wikimedia website. Yay us. --Tagishsimon (talk) 14:25, 15 March 2019 (UTC)

March 19? What's it like in the future? :P Also, yay! Nicereddy (talk) 18:56, 15 March 2019 (UTC)

That's very exciting! :D
I wonder if this is partially due to wikidata-driven infoboxes becoming more popular thus convincing Wikipedia editors to contribute to Wikidata. ElanHR (talk) 21:55, 15 March 2019 (UTC)
@Tagishsimon: In a highly automated project this is not very surprisingly. Any idea about the amount of real manual edits? --Succu (talk) 22:03, 15 March 2019 (UTC)
Plus the tendency to add labels in different languages one by one. Ghouston (talk) 00:41, 16 March 2019 (UTC)
Plus the fact that users can only change a single item at a time. If Wikipedia only allowed you to add one sentence per edit... well, everyone probably would have abandoned the project on January 17, 2001. Animalparty (talk) 01:32, 17 March 2019 (UTC)

Code review request to fix Wikidata Tours[edit]

Hi all

After some digging the bug which causes Wikidata Tours not to load properly has been identified and some code has been written which works on Could someone who is able to review and approve code please take a look? Having Tours working will be extremely helpful for new contributors.

Thanks very much

John Cummings (talk) 14:50, 15 March 2019 (UTC)

From the task it isn't much clear how it should be fixed. Post the exact steps for fixing it to MediaWiki talk:Guidedtour-lib.js. Matěj Suchánek (talk) 16:40, 15 March 2019 (UTC)
@Sebastian Berlin (WMSE):, could you do this? --John Cummings (talk) 16:48, 15 March 2019 (UTC)

Indicating verified accounts[edit]

There seem to be two approaches for indicating "verified" accounts on social media - has quality (P1552):verified account (Q28378282) 110 times, or has quality (P1552):verified badge (Q48799541) (example query for twitter)

Overall, a reasonably even split between calling it an "account" or a "badge". I don't have any strong feelings on which one we use, but it feels like we ought to be consistent and pick one. Any thoughts as to which is more appropriate? They have quite different class hierarchies but seem to be tied together by has effect (P1542) and has cause (P828). Andrew Gray (talk) 23:30, 15 March 2019 (UTC)

verified account (Q28378282) seems like the right one to me, since that would be an account quality, while verified badge (Q48799541) would be a badge quality. Ghouston (talk) 00:39, 16 March 2019 (UTC)
I agree that verified account (Q28378282) reads like a better target for has quality (P1552), though verified badge (Q48799541) does seem like it's a more precise item to use, and might have been the more obvious choice with a slightly different qualifier property. It's probably also worth noting that instance of (P31) seems to be used even more as a qualifier for this purpose (>300 times on Twitter username (P2002), all with verified account (Q28378282)), so if we're going to run a bulk migration to tidy these up, it might be worth looking at fixing those too. --Oravrattas (talk) 07:34, 16 March 2019 (UTC)
Oh, well spotted - I didn't think to look for non-standard qualifiers. Instagram username (P2003) (which allows P31 in its constraints) has 59 "verified account"; Instagram username (P2003) has 131; YouTube channel ID (P2397) has seven (plus a couple of values for other things). So those are consistent, at least, even if possibly on the wrong qualifier. Andrew Gray (talk) 11:42, 16 March 2019 (UTC)
Great initiative! One observation: not all verified accounts have verified badges, but all verified badges belong to verified accounts. Facebook specifically operates with different colored badges to reflect this, Youtube has verified music accounts with or without badges, there are probably more examples. Moebeus (talk) 12:11, 16 March 2019 (UTC)
I'm thinking it would be a good idea to rename en:Verified badge to Verified account, for this reason. It has already gone wrong with "In February 2012, Facebook introduced verification badges for profiles and pages", trying to describe all such systems as "badges". Ghouston (talk) 22:05, 16 March 2019 (UTC)
But that name redirects to en:Account verification. Amazingly, the Verified badge article doesn't mention the former. Perhaps these two articles should be proposed for merging. Ghouston (talk) 22:02, 17 March 2019 (UTC)

Rank for fictitious authors listed alongside actual authors of a paper[edit]

At The Morphology of Steve (Q50422077), there is some disagreement whether only the 3 actual authors should have normal or preferred rank in author (P50) and the others deprecated rank.

The reference for this was added but deleted by some user. --- Jura 09:42, 16 March 2019 (UTC)

Viswaprabha (talk)
Maximilianklein (talk)
Jane023 (talk) 08:21, 30 May 2013 (UTC)
Alexander Doria (talk)
Ruud 23:15, 24 June 2013 (UTC)
Jayanta Nath
Yann (talk)
John Vandenberg (talk) 09:14, 30 November 2013 (UTC)
Danmichaelo (talk) 19:30, 16 February 2014 (UTC)
Ravi (talk)
Mvolz (talk) 08:21, 20 July 2014 (UTC)
Hsarrazin (talk) 07:56, 9 August 2014 (UTC)
PKM (talk) 19:58, 10 October 2014 (UTC)
Revi 16:54, 29 November 2014 (UTC)
Giftzwerg 88 (talk) 23:36, 1 January 2015 (UTC)
Almondega (talk) 00:17, 5 August 2015 (UTC)
Jura to help sort out issues with other projects
Skim (talk) 13:52, 24 June 2016 (UTC)
Marchitelli (talk) 12:29, 5 August 2016 (UTC)
BrillLyle (talk) 15:33, 26 August 2016 (UTC)
Alexmar983 (talk) 23:53, 28 August 2016 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 10:44, 29 August 2016 (UTC)
Chiara (talk) 14:15, 29 August 2016 (UTC)
Thibaut120094 (talk) 20:31, 14 September 2016 (UTC)
Ivanhercaz | Discusión Plume pen w.png 15:30, 31 October 2016 (UTC)
YULdigitalpreservation (talk) 17:35, 10 November 2016 (UTC)
PatHadley (talk) 21:51, 15 December 2016 (UTC)
Erica (ohmyerica) (talk) 19:26, 1 January 2017 (UTC)
Mauricio V. Genta (talk) 05:38, 12 March 2017 (UTC)
Sam Wilson 09:24, 24 May 2017 (UTC)
Sic19 (talk) 22:25, 12 July 2017 (UTC)
MartinPoulter (talk) 09:21, 20 July 2017 (UTC)
ThelmadatterThelmadatter (talk) 01:11, 13 September 2017 (UTC)
Zeroth (talk) 15:01, 16 September 2017 (UTC)
Beat Estermann (talk) 20:07, 12 November 2017 (UTC)
Shilonite - specialize in cataloging Jewish & Hebrew books
Elena moz
Oa01 (talk) 10:52, 3 February 2018 (UTC)
Maria zaos (talk) 11:39, 25 March 2018 (UTC)
Wikidelo (talk) 13:07, 15 April 2018 (UTC)
Mfchris84 (talk) 10:08, 27 April 2018 (UTC)
Mlemusrojas (talk) 3:36, 30 April 2018 (UTC)
salgo60 Salgo60 (talk) 12:42, 8 May 2018 (UTC)
Dick Bos (talk) 14:35, 16 May 2018 (UTC)
Marco Chemello (BEIC) (talk) 07:26, 30 May 2018 (UTC)
 徵國單  (討論 🀄) (方孔錢 💴) 14:35, 20 July 2018 (UTC)
Alicia Fagerving (WMSE)
Louize5 (talk) 20:05, 11 September 2018 (UTC)
Viztor (talk) 05:48, 6 November 2018 (UTC)
RaymondYee (talk) 21:12, 29 November 2018 (UTC)
Merrilee (talk) 22:14, 29 November 2018 (UTC)
Kcoyle (talk) 22:17, 29 November 2018 (UTC)
JohnMarkOckerbloom (talk) 22:58, 29 November 2018 (UTC)
Tris T7 TT me
Helmoony (talk) 19:49, 8 December 2018 (UTC)
Shooke (talk) 19:17, 12 January 2019 (UTC)
DarwIn (talk) 14:58, 14 January 2019 (UTC)
I am Davidzdh. 16:08, 18 February 2019 (UTC)
Juandev (talk) 10:03, 27 February 2019 (UTC)
Pictogram voting comment.svg Notified participants of WikiProject Books pinging project --- Jura 09:45, 16 March 2019 (UTC)

Why have you repeatedly marked the named, cited, editors as deprecated? some user; Talk to some user 14:52, 16 March 2019 (UTC)
This sounds like a technicality in how "author" is defined: the names listed on the article, or the people who actually wrote it? They will almost always be the same, or at least assumed to be the same for lack of other evidence. Ghouston (talk) 22:08, 16 March 2019 (UTC)
I suppose a relatively common situation would be ghostwriter (Q623386) or some kind of fraud in publishing somebody else's text. Ghouston (talk) 22:16, 16 March 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── If the named authors were not consulted, then qualifiyng with a "has role"-"unconsulted author" may be appropriate; deprecation - and especially deprecation with no reason as a qualifier - is not. This has nothing to do with ghostwriting. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:43, 17 March 2019 (UTC)


  • To return to the original question, what do others think? --- Jura 20:09, 17 March 2019 (UTC)
I think we agree that an author can be added, even if not stated on the publication, if there's a reliable reference that says they were an author. So it's consistent to also say that authors can be removed if there's a reliable reference that says they didn't contribute to it in any way, at least in cases like this where's there's no dispute. If you were generating a list of works for one of those fake authors, it doesn't seem right to include this article. Ghouston (talk) 23:05, 17 March 2019 (UTC)
No, that's not consistent; not what was proposed (the issue was around improper deprecation, not removal); and not something we agree on. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 00:21, 18 March 2019 (UTC)
  • I think Ghouston's point is supported by Help:Ranking. --- Jura 20:04, 18 March 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── There is nothing on Help:Ranking that supports removing such data. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:34, 18 March 2019 (UTC)

Where I said removal, I'd support deprecation as a better alternative, and maybe a new reason for deprecation. Ghouston (talk) 22:32, 18 March 2019 (UTC)

Importing UIDs for people, from Wikispecies[edit]

The page I have just at species:Wikispecies:Biographies with no identifiers contains a Wikidata query which returns a list of people with a Wikispecies biography, but with no UIDs (VIAF, ISNI, ORCID, IPNI, Zoobank, etc) on Wikidata.

There are currently 20,183 people in the list! Some of them, such as species:A. Murdoch, have an ID (Zoobank, in this case) as part of links in the Wikispecies article text. These are often not templated (as in Murdoch's page), or use a template which is ambiguously used for both people and works, such as species:Template:ZooBank, where we have two or more ID properties (e.g. ZooBank author ID (P2006)/ZooBank publication ID (P2007), so the HarvestTemplates tool cannot be used).

Can anyone help with automating the importing of such values? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:50, 16 March 2019 (UTC)

You can harvest in demo mode, download the data and then process it further using other tools such as PetScan and PagePile. Thierry Caro (talk) 20:28, 16 March 2019 (UTC)
Thank you. That's useful tip, but turns out to have a very small return. It seems the vast majority of such IDs are not in templates. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:48, 16 March 2019 (UTC)


My wikidata is set for the language Welsh, I'm trying to find how to translate the 'potential issues' box that can be seen in the following screenshot. Could anyone explain how I can translate this. Thanks

Screenshot of wikidata.png

Johnogwen123 (talk)

@Johnogwen123: I think you need to find the relevant message at A separate account on that website is needed to contribute translations. Jc86035 (talk) 10:14, 17 March 2019 (UTC)


Can someone lock Fiona Caroline Graham (Q256916), there is a movement to remove the information on the birth information of geishas, because they are are never supposed to reveal their age. A fan of this geisha from Australia and I think a second fan from Brazil has a campaign to remove her age from all the Wikipedias (or one person spoofing a Brazil IP). They emailed me asking to remove it, when I said the info came from the Library of Congress, they wrote the LOC to have it removed from their website. We now link to the cached version. However it is public information, and accurate. Even if the geisha age rule was a real thing, the person in the entry is not a licensed geisha according to the Asakusa Geisha Association. --RAN (talk) 00:26, 17 March 2019 (UTC)

Semi'd for a month. Take protection requests to WD:AN next time, RAN. Mahir256 (talk) 03:20, 17 March 2019 (UTC)

Proper way to mark uncountable/mass nouns as such?[edit]

See Lexeme:L4592 for an example of an uncountable/mass noun. Gold is a word, but golds isn't. Is the correct way to mark uncountable nouns as such has quality (P1552) with mass noun (Q489168)?  – The preceding unsigned comment was added by SixTwoEight (talk • contribs).

Pretty sure Lexicography uses instance of (P31) just fine. Judging by the examples on its page, has quality (P1552), seems to call for a quality that can be further defined (animacy (Q1250335) is a quality, inanimate (Q51927539) is not). Instance of noun have a grammatical gender (P5185) property that applies to them, but it is the word class of noun/adjective/determiner which, depending on language, has quality (P1552) of animacy (Q1250335) or grammatical gender (Q162378).
Being a mass noun is clearly not a quality to me, it's a lexical category (the actual quality, for which no item exists yet, is "countability"). Honestly Looking at uses of "has quality", I don't see a lot that unambiguously belong there, but I suspect that the problem is really that the base properties to use for lexemes have seen basically no discussion whatsoever before the feature was launched, so it's a massive mess of inconsistent usage because nothing is being checked by bots.
I think implementing word classes as a weird supra-category that is not actually handled via an actual property was an error, as it is causing people to assume it basically supplants and prevents use of instance of (P31) for the entire Lexeme namespace. Circeus (talk) 01:27, 17 March 2019 (UTC)
  • There is some debate how and if P31 should be used in Lexeme namespace. (To those not aware of it: each entity in Lexeme namespace has a lexical category (e.g. noun) and language defined. Lexical categories are similar to P31 statements)
    For the above, some users use has quality (P1552) (as you did). This does seem compatible with the property definition.
    Others use instance of (P31) for the same, even if they don't use P31 in all other cases. The result is that a query based on P31/P279 that isn't limited to property entities and item entities includes some odd results.
    A third option could be to add the lexical category systematically as a statement in P31 or have Wikibase generate a P31-triple automatically based on the lexical category. --- Jura 09:00, 17 March 2019 (UTC)

Scientific names of taxa should be a separate entities from the taxa themselves[edit]

Currently, a scientific Latin name for an organism is a property of a taxon, rather than an entity in of itself. However, this causes inconsistencies. Each Latin name has one or more authors, an associated protolog, a publication and a type specimen in a collection. These pieces of information are only related to the Latin name and not the taxon. The taxon is a scientific concept and the earliest valid name is chosen as a label for this concept. This problem with the data model means that the International Plant Names Index (Q922063) identifiers are being used as properties of taxa, which they are not. They are identifiers for published Latin names (valid or not). This is causing me problems because I want to link nomenclatural type specimen details with the name that they are types of, but I can only link them to taxa.

How do we go about getting this change?

Is there a will to fix this?

How do we represent type specimens in Wikidata and their links to names?

I imagine it will be painful, but it is better to do this sooner rather than later Qgroom (talk) 09:20, 17 March 2019 (UTC)

  • I agree with the general problem and think it would be desireable to the taxons in the Q-namespace and the names in the L-namespace, where names belong. ChristianKl❫ 20:57, 17 March 2019 (UTC)
At this stage, this involve splitting hundreds of thousands of items from each others. As much as I like the idea myself, I don't think it's realistically feasible whatsoever (not to mention how the distinction is completely lost of most people who are not deeply familiar with codes of nomenclature for organisms). Circeus (talk) 11:29, 17 March 2019 (UTC)
It might be hard, but if the data model is demonstratably wrong isn't the situation only going to get worse the longer it is left? It may be hundreds of thousands of items now, but it will be millions of items eventually. The problem will mean that Wikidata will fail to be useful for biodiversity informatics, where it could be a real game changer. Wikidata's unique selling point it that it can be fixed. Qgroom (talk) 12:15, 17 March 2019 (UTC)
@Qgroom: Some thoughts from 2016. --Succu (talk) 13:06, 17 March 2019 (UTC)
  • My outside impression is that we had someone complain the other day about Wikidata doing the opposite? --- Jura 11:51, 17 March 2019 (UTC)
The opposite of what? Qgroom (talk) 12:16, 17 March 2019 (UTC)
  • The opposite of what you are complaining about by creating new items for every name. --- Jura 12:20, 17 March 2019 (UTC)
I suppose Wikidata could get so big that it just grinds to a halt, but there are an estimated 9 million species. Not all of these are described, but even if they were and each one had two names then it is not going to be a number that Wikidata can't handle. Apparently, there are 2.5 million scientific publications published every year and these are getting added to Wikidata Qgroom (talk) 12:33, 17 March 2019 (UTC)
  • I think it's already being done. If homo sapiens is or was also called something else, there would be a separate item for that name. We just generally don't have an item like Q5. --- Jura 12:45, 17 March 2019 (UTC)
Although Q5 has the description "common name of Homo sapiens..." it is clear from its properties that it is refering to much more than the name. I suspect Q5 and Q15978631 should be merged, because they both describe the concept of humanness and neither describes the name, whether Latin or English.  – The preceding unsigned comment was added by Qgroom (talk • contribs) at 12:58, 17 March 2019‎ (UTC).


Wikidata asserts that this is a picture of a common name

This is certainly a serious concern, which is stopping the use of Wikidata in other WMF projects, and externally. It's also internally incongruent - we don't define table (Q14748) as "name of a piece of furniture". One of the many prior discussions is "What heart rate does your name have?". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:59, 17 March 2019 (UTC)

Thanks @Succu: and @Pigsonthewing: for those links to earlier discussions on the topic. I had imagined the subject was going to be difficult, I was not wrong.  – The preceding unsigned comment was added by Qgroom (talk • contribs) at 14:12, 17 March 2019‎ (UTC).
@Pigsonthewing: No, please don't merge human (Q5) and Homo sapiens (Q15978631), they reflect closely related, but distinct concepts. When we started out with reflecting human genes on Wikidata, we used the statement found in taxon (P703) human (Q5), however, this lead to inconsistencies, basically because of human (Q5) not being a taxon (Q16521) and as such didn't fit with genetic models. You could argue that by simply making human (Q5) instance of (P31) taxon (Q16521) fixes this, however there are examples (e.g. Lexa (Q23023325) where an item is righteously instance of (P31) human (Q5), but not instance of (P31) Homo sapiens (Q15978631). --Andrawaag (talk) 10:13, 18 March 2019 (UTC)
Don't worry I've no intention of messing with humans. The point is, humans are exceptional in a number of ways and in this case we should not get deflected by these special cases.Qgroom (talk) 11:49, 18 March 2019 (UTC)
@Andrawaag: Where did I propose such a merge? I've fixed Q23023325, which should not have been using Q5. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:09, 18 March 2019 (UTC)
@Pigsonthewing: You were the first signature after "I suspect Q5 and Q15978631 should be merged". It seems that I was wrong there, appologies. --Andrawaag (talk) 22:23, 18 March 2019 (UTC)
I'm hardly a WikiData expert, but about all i do on here relates to taxa. Here are some thoughts:
  1. Many common names relate to more than one taxon, so if this separation happens, it needs to accommodate many q-items using the same common name item.
  2. Many taxa already have more than one scientific name, mainly in the form of synonyms. I am under the impression that each synonym should have it's own separate q-code. Even if this is not the desired policy, it is certainly happening in many cases. In these cases, it approaches the situation requested above. I think if we want one item for a common name and scientific name, we'd have to combine more than one scientific name. Personally, with how names change back and forth this may be unwise.--NessieVL (talk) 01:41, 19 March 2019 (UTC)

Here are some thoughts...[edit]

Use cases[edit]
  • Finding where type specimens are
  • Testing that there is only one holotype, lectotype or neotype.
  • Testing other rules of nomenclature
  • Testing that the authorship of names is correct
  • Finding the literature on the naming of taxa
  • Linking names of taxa to authors
  • Testing that the name has been legitimately typified
  • Identifying syntype material
Ways forward?[edit]
  • Ignore the cases of cats, dogs and humans and start with the more obscure and therefore less controversial taxa.
  • Continue labelling taxon concepts as they are now as Instance of P31 Taxon Q16521 with a label, such as "Viola lutea"
  • Add new scientific name items as Instance of P31 name Q82799 (or preferably a new item class - scientific Latin name)
  • Label these new scientific name items with something like "Viola lutea Huds. (name)"

Doubtless I'm being too naive, but if there are ways to breakdown a difficult problem into manageable chunks then perhaps it is solvable.  – The preceding unsigned comment was added by Qgroom (talk • contribs) at 14:12, 17 March 2019‎ (UTC).

PS: One thing I note is that previous discussions have conflated taxonomy and nomenclature. The later is concreate, has tightly defined rules and is therefore much more tractable in a database than taxonomy, that often comes down to a matter of opinion.  – The preceding unsigned comment was added by Qgroom (talk • contribs).

  • I think we should get inspired from Darwin Core, with their occurence system that allow to store, and then retrieve, the infos about specific specimens into databases. If we allow that a specific specimen, or group of specimens, can have an item, then it will be easy to store a lot of infos including something like subject has role (P2868) holotype (Q1061403) of (P642) of the taxon of your choice.
    Furthermore in the perspective of the future that binds more and more Wikimedia Commons and Wikidata, it will be great if one day we can import with automated tools the images of free datasets into Wikimedia Commons and the and related information into Wikidata, example see this occurence and this image manually uploaded into Commons and where related information is currently poorly rendered and used (+the work is too important to be done manually in the long run). I dream that one day the free datasets of gbif be imported into the Wikimedia Project (medias in Commons and infos in Wikidata). Christian Ferrer (talk) 21:30, 17 March 2019 (UTC)
Yes, but one specimen can be the nomenclatural type of multiple names and be cited in several publications. So it doesn't get around that the name is an entity in of itself. All the nodes of the biodiversity knowledge graph are individual entities (
If you have an issue with the current system, then with the name as an entity, you move this issue to this new entity, but you solve nothing. Christian Ferrer (talk) 12:15, 18 March 2019 (UTC)
Example, here Dermechinus horridus (Q2743032) and the synonym (also the original combination) Echinus horridus (Q62085775). If you have two different name, and that you need the both names here for a strucutred data purpose, then create a new item, as I show in my example. Or as much as you need. Christian Ferrer (talk) 12:25, 18 March 2019 (UTC)
Two properties pointing to the name[edit]

I am wondering if the convention used in Wikicite wrt to author names in using both author (P50) and author name string (P2093) can apply here as well. The ideal solution would be to change taxon name (P225) to accept items instead of string, but given that taxon name (P225) is already in widespread use changing seems impossible. Hence mimicing the wikicite solution with authors helps? Just my 2cts. --Andrawaag (talk) 22:06, 17 March 2019 (UTC)

Yes, I think that taking inspiration from the item/ string split between author (P50) and author name string (P2093) looks promising. In that vein, I think the current taxon name (P225) should remain as a string property, and the only thing we would have to change there would be the labels of P225, to "taxon name string" (and equivalents in other languages). This would then be complemented by a new property "taxon name" (or perhaps better "taxon name item", to reduce the ensuing confusion) that would point to an item. Once we have a P50 statement on a publication item, we usually remove the P2093 one (storing its value via stated as (P1932)). I think the "taxon name string" statement would eventually have to be deleted from the taxon item (and migrated to the taxon name item), but perhaps we should allow for some more time here than for the P50-to-P2093 conversion. --Daniel Mietchen (talk) 03:23, 18 March 2019 (UTC)
No. That will not work. Properly done, a taxon has four parts; the name (ie taxon), the author (ie possibly multiple names), the publication and the date. This is what it takes to uniquely describe a taxon. I have said it before and I say it again. Thanks, GerardM (talk) 06:23, 18 March 2019 (UTC)
I am not sure I understand. Isn't the issue here that there is a need to distinguish between a taxon and a taxon name. You argue that a taxon is described by 4 concepts of which its name is one. But you also argue that the name is the taxon, which at the same time is one of the four characteristics of a taxon. Isn't this the kind of Droste effect being addressed here? --Andrawaag (talk) 09:36, 18 March 2019 (UTC)
I think the anology with author (P50) and author name string (P2093) is a good solution that is non-disruptive, but allows progress. How do we get this done? Qgroom (talk) 11:57, 18 March 2019 (UTC)
  • There is no need to separate the taxon from the taxon name, an item about a taxon is and should stay an association of different things (name, author, date). In case of synonymy, then this is not anymore the same association and a new item is needed and have to be "instance of/ synonym of..". In the extend, of course, that we think relevant to have this item here, example in case of original combination, but we don't need to list all synonyms IMO, though the debate can stay open. But in all case when a species is "renamed" and that the a new name is accepted, we must absolutely create a new item, not change what is written in the string field "taxon name", as it is sometimes the case here. To come back about a potential separation of the taxon name, this will solve nothing of any potential issue (if any), it will just complicate things. A taxon is absolutly not "a thing with potential different names", because when the names are different then we talk about different taxa. A taxon without "taxon name" is not a taxon. We are not going to reinvent science. Christian Ferrer (talk) 12:12, 18 March 2019 (UTC)
@Christian Ferrer: "A taxon without 'taxon name' is not a taxon" – the ICZN in the glossary entry for "taxon" says it's a taxonomic unit whether named or not, so you are using a different definition to the Code. Peter coxhead (talk) 17:27, 19 March 2019 (UTC)
@Peter coxhead: There is no need to push the ontology so far. As a data, yes of course yes, a taxon is not a taxon without a name and without the resulting properties of that taxon name. Christian Ferrer (talk) 17:59, 19 March 2019 (UTC)
There is a need. As said before, a taxonomic name is linked to a type specimen, an author and a protologue and these are not properties of the taxon, they are properties of the name. The taxon, is a biological hypothesis and is much more fluid in its scope. There is no single authority for which name is accepted. What is accepted is always a matter of opinion. On the other hand taxonomic names are concreate, founded on clear rules. Taxonomic synonyms are not just different names for the same thing. A name can even exist even if we don't know what taxon it is supposed to represent. These names are an important link between the biology, collections and scientific literature. Qgroom (talk) 14:21, 18 March 2019 (UTC)
Each taxa item here has currently only one taxon name, almost all if not all the other properties (included, and in addition of what you quote yourself, the parent taxon, and all external identifiers) are relative to that specific name, you said it very well, and if you move the name then you need to move all the rest, therefore you move nothing. The only thing that can stay, and not for all cases, is the common name. Christian Ferrer (talk) 17:46, 18 March 2019 (UTC)
@Christian Ferrer: each item incorrectly claimed to be an instance of a taxon has only one taxon name, because it is actually an instance of a taxon name, not a taxon. See #Previous discussion below. Many taxa are represented here by multiple taxon names. Peter coxhead (talk) 17:02, 19 March 2019 (UTC)
@Peter coxhead: Yes and no. What would be an instance of taxon otherwise? what will be the label? it is just a concept that take consistency only when it is named and defined. And as I noted above, absolutly all the other properties currently used are depending of that name and definintion, therefore if you have an item for the taxon name, you have in this item all the other properties too (external identifiers, rank, parent taxon, publications, author, sitelinks, ect...). What will be in this item "taxon" in addition of a property taxon name where the value will be the Qitem of the taxon name? absolutly nothing because everything flows from this taxon name and from all the properties that are unique to that taxon name. The only thing that it will allow to do it is the possibility to add multiple values to that property taxon name, but ultimately it does not bring nothing because currently you can create synonym items as much as you need. Therefore to change completely will create a lot of issues to solve none. Christian Ferrer (talk) 17:50, 19 March 2019 (UTC)
Assuming that we were changing "instance of taxon" to "instance of taxon name", not a single property of Holothuria scabra (Q2395506) will be changed. And what? The only result will be a potential (big) disrupion of the Wikimedia projects that are currently using tools that works with "instance of taxon", but what's new? What is the interest to materialize the "true" concept of "taxon" in a structured data? what will be the name of that item "taxon", what label? the accepted name, maybe? but this is the current situation,!? the other names are synonyms, thing that they can currently be, so what new? Christian Ferrer (talk) 19:00, 19 March 2019 (UTC)
In my opinion Wikicite is a disaster creating wrong titles, duplicated items etc. example often making it hard to add basionym (P566) or original combination (P1403) based on sources. --Succu (talk) 21:28, 18 March 2019 (UTC)

Possible solution[edit]

With our existing infrastructure we can use lexemes for the latin names and use item for this sense (P5137) to link them to our taxon items, the new lexeme/sense items can get the information about taxon author and publication. A bot could simply do this for all the existing taxons that we have. As a next step we can merge items that describe the same taxon via the existing information from taxon synonym (P1420). Does anybody see problems with doing that? ChristianKl❫ 12:09, 18 March 2019 (UTC)

Though this solution is at the opposite of the original subject, that tend to divide than to merge. For a structured purpose, e.g. if we need to link a specimen as to be a specific type, example "holotype", of a specific taxon with a specific name (even if is not the current accepted name) then we need an item for this specific taxon. And this is currently possible with the current system, see my example. You can very well have a specimen that is the holotype of Echinus horridus (Q62085775), while the accepted taxon is Dermechinus horridus (Q2743032). For that purpose of what is needed is that we allow to create items for specific zoological specimen (Q2114846) or something similar. No really, no, in taxonomy, the same thing with two different name is not the same thing but well two different things, we need to keep the item separate. Christian Ferrer (talk) 12:55, 18 March 2019 (UTC)
Yes, I agree Dermechinus horridus (Q2743032) and Echinus horridus (Q62085775) should be merged, because they refer to exactly the same species concept. However, these items have conflated name information and taxon information. So that the properties taxon author citation (P6507) and publication in which this taxon name was established (P5326) would then need cleaning up. If names were separated from the species concept then multiple species concepts could coexist, as they already do in the real world. Qgroom (talk) 14:35, 18 March 2019 (UTC)
I fail to understand why we should clean up the original publication of the original name, this is useful information. As well as the author citations of both items are useful, as these both citations are (different) currently used, example. Further more a taxon name is linked to one specific author citation, therefore if you quote the name somewhere you have to quote the author citation too. Christian Ferrer (talk) 18:13, 18 March 2019 (UTC)
@Christian Ferrer:Why do you consider that to be important. Why isn't it enough when we store the orginal name in the source document with some form of 'states as'? ChristianKl❫ 15:48, 18 March 2019 (UTC)
@ChristianKl: Here the taxa items currently need a rank, a name and a parent. A taxon synonym have obvioulsy a different name, may have a different rank (e.g. a species can be a synonym of a subspecies), and may have a parent taxon different (in the facts all is (or may be) different, as pointed there). I fail to understand what is the interest to merge such things, in addition of that I fail to understand how can work well a taxon chain like that. Though this is not because I don't understand it that it is impossible. Christian Ferrer (talk) 17:58, 18 March 2019 (UTC)
I've listed a few of the use-cases above. These can't be resolved by treating taxonomic names and taxa as the same thing, nor by treating names as mere labels. The inconsistencies are already apparent in the Dermechinus horridus (Q2743032) and Echinus horridus (Q62085775) example. Qgroom (talk) 17:40, 18 March 2019 (UTC)

@ChristianKl: I'm a Wikidata beginner. Could you point me to some material to explain your lexeme proposal? Qgroom (talk) 14:40, 18 March 2019 (UTC)

Lexeme's are entities that represent how a given concept is called in a given language. The were recently introduced to map to Wikidictionary entities. provides more information. ChristianKl❫ 15:48, 18 March 2019 (UTC)
I doubt binomen (Q864016) should be treated as lexemes. --Succu (talk) 20:46, 18 March 2019 (UTC)
Claiming that Dermechinus horridus (Q2743032) and Echinus horridus (Q62085775) refer to "the same species concept" is a huge stretch. In almost all cases it is impossible to say to what species concept a name refers to without citing at least one actual taxonomic paper. There are very few Wikidata items that refer to a particular species concept; this is only possible if the taxon is very well-known and uncontroversial or if an item is defined in terms of a particular taxonomic position.
        The basic issue is that scientific names can be databased very easily, and one name per item allows adding all nomenclatural information, such as authorship of the name, and any detail on typification (in principle, we may need extra properties).
        On the other hand taxa can not be databased at all (with a few exceptions) except by using scientific names. And scientific names can refer to any number of differently defined taxa.
        Having one item for one scientific name allows any bit of data from the literature to be databased accurately. Recording information accurately surely should be the foundation of any database policy. - Brya (talk) 17:51, 19 March 2019 (UTC)
"refer to "the same species concept" is a huge stretch." yes, and this is why there is currently two different items. Christian Ferrer (talk) 18:03, 19 March 2019 (UTC)


We have taxonomic type (P427) to add them. What we are missing is a data model for types at species rank and below. This are your major use cases above, Qgroom. How should we do this? --Succu (talk) 19:09, 18 March 2019 (UTC)

taxonomic type (P427) is about type species, despite the similar sounding name, this is something completely different to a type specimen. Qgroom (talk) 20:17, 18 March 2019 (UTC)
No. Please see Rausch 572 (Q19359611). --Succu (talk) 20:30, 18 March 2019 (UTC)
I don't understand. Rausch 572 (Q19359611) is a type specimen, it is not a necessarily a taxonomic type (P427) as it is defined (BTW that's a terrible label). Qgroom (talk) 21:27, 18 March 2019 (UTC)
Hopefully [6] Acanthocalycium thionanthum (Q337710) and Acanthocalycium ferrarii (Q337692) are not based on the same type. If the would be the case replaced synonym (for nom. nov.) (P694) should be applied. --Succu (talk) 21:42, 18 March 2019 (UTC)
The replaced synonym (for nom. nov.) (P694) is being used incorrectly in the item Acanthocalycium thionanthum (Q337710). The definition of replaced synonym (for nom. nov.) (P694) is "the type genus of this family (or subfamily, etc), or the type species of this genus (or subgenus, etc)". It is therefore referring to a taxon, not a specimen. So it can't be Rausch 572. This is the sort of logical inconsistency I want to resolve by making a clear distinction between taxa and taxonomic names. --Qgroom (talk) 05:51, 19 March 2019 (UTC)


Where to put them? Do Wikimedia project describe taxon concepts or only list names according to a special source? This includes the automatic creation of items here based on an external id. --Succu (talk) 19:38, 18 March 2019 (UTC)

Note that I don't know exactly how that works but in case of synonym in Wikimedia Commons the links may be given, example. Christian Ferrer (talk) 19:58, 18 March 2019 (UTC)
Commons prefers this taxonomic viewpoint. This includes the renaming of pictures to their preferred view point. I very bad practice. --Succu (talk) 20:17, 18 March 2019 (UTC)
Wikimedia uses taxon concepts, anything else would be odd. The case of Index Fungorum you mention is interesting. Index Fungorum is a list of names, like IPNI, however it links to Species Fungorum where a list of accepted taxa is maintained. A clear case of maintaining separation between nomenclature and taxonomy. Qgroom (talk) 20:26, 18 March 2019 (UTC)
Hard to believe Wikimedia uses taxon concept (Q38202667). Hard to believe all Wikimedia projects follow the same concept (birds, mammals, …) unisono. --Succu (talk) 20:38, 18 March 2019 (UTC)
Of what I point is not Wikimedia Commons practice but the fact that the sitelinks ca be retrieved from Rhodocybe nitellina (Q10434744) to Rhodophana nitellina (Q51954845), I guess with taxon synonym (P1420), the illustration can be seen in the Commons category that I have pointed. In summary the current system is good : no matter that a specific project use one name and another project use another name, because the current system allow the navigation in despite of that. And it is unrelative to the category redirect that you have pointed, there was a discussion about this feature on Commons, but I'm not able to find it, I know that taxon synonym (P1420) and our local infobox are involved. Christian Ferrer (talk) 21:11, 18 March 2019 (UTC)
The statement in Wikimedia Commons example is wrong. Index Fungorum make no judgement about whether Rhodocybe nitellina is an accepted name or a synonym. Index Fungorum is just a list of names. However, it does state that Rhodophana nitellina (Fr.) Papetti is the accepted name in Species Fungorum. Qgroom (talk) 21:26, 18 March 2019 (UTC)
Yes it does, though the link on Commons leads indeed to another page. But this is out of topic here, as we talk about sitelinks. Christian Ferrer (talk) 21:40, 18 March 2019 (UTC)

External IDs[edit]

Some of them are more dedicated to a nomenclatural act (eg. IPNI, IF, Mycobank, Zoobank should). A lot of them (NCBI, FishBase, IUCN …) follow a certain taxonomic viewpoint. That means the same ID can point to different scientific names. Others (e.g. WoRMS) are in between, they have Ids for accepted/valid names and earlier ones. Same question: where to put them? --Succu (talk) 20:06, 18 March 2019 (UTC)

Well that's why I'm asking for name items separate from taxon items, but perhaps the question was not directed at me. Qgroom (talk) 20:36, 18 March 2019 (UTC)
Where to put them at your proposed name centered items? --Succu (talk) 22:15, 18 March 2019 (UTC)

Previous discussion[edit]

The issue of representing taxon names versus taxa has already been discussed in great detail, more than once. See e.g. Property talk:P1420#data model.

The main points to be clear about first are:

  • A taxon is not the same as a taxon name. Explaining clearly is complicated by the different terminologies employed in the nomenclature codes, but using the ICZN here, a taxon or taxonomic unit, whether named or not, is "a population, or group of populations of organisms which are usually inferred to be phylogenetically related and which have characters in common which differentiate .. the unit .. from other such units."
  • A taxon can properly (validly, legitimately) have more than one name, if it is given a different placement within another taxon, and/or a change of rank. Thus precisely the same group of organisms may be given the name Muehlenbeckia florulenta (Q1101419) or the name Duma florulenta (Q18081078) depending on which genus a taxonomist considers them to belong to. Precisely the same group of organisms may be given the name Hyacinthaceae (Q13833438) or Scilloideae (Q133292) depending on whether the differences between them and other groups are considered sufficient to treat them as a separate family or to merge them into another family as a subfamily. There is no absolute right or wrong name in most such cases; it's a matter of legitimate taxonomic opinion.

For me, there are two issues:

  1. Wikidata should stop claiming that instances of taxon names are instances of taxa.
  2. Ideally, Wikidata should represent taxa as well as taxon names.

I have no idea why there seems to be resistance to (1).

(2) is, however, difficult, as is explained in detail at Property talk:P1420#data model. Here's another example. There are two ways of classifying the genus Hyacinthus used in the current botanical literature, as shown in the table below. 1–6 are the six taxa involved; the others are taxon names. (There's implicitly another taxon, with two possible names: for those who use the left hand column, Family Asparagaceae is a different taxon from #2, a taxon treated as a subfamily by those who use the right hand column).

1 Order Asparagales
2   Family Asparagaceae
3 Family Hyacinthaceae Subfamily Scilloideae
4 Subfamily Hyacinthoideae Tribe Hyacintheae
5 Tribe Hyacintheae Subtribe Hyacinthinae
6 Genus Hyacinthus


  • the same taxon (defined as the same group of organisms) has more than one name – Subfamily Hyacinthoideae in the left-hand column is the same group of organisms as Tribe Hyacintheae in the right-hand column
  • the same name applies to different taxa – to know the composition (circumscription) of "Tribe Hyacintheae", you need to know which system is being used.

As far as I am aware, no-one has yet shown in detail with a worked example how best to represent data of this kind in Wikidata. Peter coxhead (talk) 12:26, 19 March 2019 (UTC)

This mostly very good.
  • However, "A taxon can properly (validly, legitimately) have more than one name," is not a fruitful way to phrase it. A taxon can properly have only one "correct"/"valid" name (some exceptions in higher ranks) from any one particular taxonomic perspective. This is the whole purpose of the nomenclature Codes. The problem (if it is that) is that there may exist any number of taxonomic perspectives, each representing a particular scientific approach. These may be regarded as several mini-universes, which are mutually exclusive. In each mini-universe a taxon may have a different name, but only one name at a time. For it to have a different name, there needs to be a switch to a different mini-universe.
  • The "instance of: taxon" is a historical oddity. Basically it means nothing more than that a P225 statement is present, and "instance of: taxon name" could also have been used. However, "instance of: taxon" is not wildly inaccurate, as such things go, since any taxon name is not only a name, but is also used to refer to a taxon (that is what it is there for). Somebody, somewhere, somewhen did use it to refer to a taxon, at least once. - Brya (talk) 18:15, 19 March 2019 (UTC)


  • I would like to make a couple of comments from the point of view of a nomenclatural taxonomist. However I work with the ICZN code and the other four codes of nomenclature are different in some aspects.
    • A name on its own for the species level cannot be used without a genus. However names have an original combination and then can have multiple subsequent combinations. One way would be to have q items for every original combination and include on that page subsequent usages. This would then link to the q item of the current combination. This can likewise be done for synonyms. I am still only referring to species level only. For example the original combination for the mata mata is Testudo fimbriata, it was later moved and became Chelus fimbriata before its spelling was corrected (gender agreement) to Chelus fimbriatus. All this info would go on the q item for Testudo fimbriata with original refs, then link the item to the current species page for the Mata mata. As there are some 15 synonyms of the Mata mata you would need to do this for all of them. Which leads to a problem posed above which is that this is a lot of work.
    • Higher orders would be simpler but also follow the same principal for synonyms. It would still be a lot of work.
    • As noted above it would be easier to start this on small groups (at order level) to trial out the best way to do it before doing anything to complex groups such as vertebrates which can have 15-20 synonyms per species.
    • I agree that you should list the type specimen and the type locality.
    • This would be a high maintenance and highly specialised endeavour, do you really have the editors that can do this at a significant rate, and have the skills to do it. If not and be honest this is a massive undertaking and would require detailed policies for new editors to learn exactly what your doing.
    • Apart from the initial undertaking this is also a significant undertaking in terms of maintenance, in reptiles over the last 20 years the number of species has gone from 6000 to 10500, with many new combinations and synonymies proposed. This is just the reptiles. The people doing this also have to maintain it, this can mean up to 10 or even more edits per year per page that would be major updates that would effect multiple pages at the same time. Of course this is actually ignoring anything outside the major divisions, so I have ignored anything prefixed by sub or suffixed by inae.
    • Lastly you will need your editors of this to be very aware of the difference between nomenclature and taxonomy, to truly understand terms such as type, taxon, and many more, ie memorise the glossary of the code. You are going to have to make some decisions, there are many instances of multiple papers coming out with differences of opinion on what the correct nomenclature for a group is. How will you decide which one to use? In other words some groups are extremely dynamic and in a state of flux.
  • In summary although some great ideas, I think this idea is a big ask of the relatively few people who are well vrsed enough in the complex field of nomenclature to actually do it.
Scott Thomson (Faendalimas) talk 20:00, 19 March 2019 (UTC)

Domain names[edit]

How are Internet domain names supposed to be modelled in Wikidata? Should they have their own items?

  • At least one Wikidata property, Alexa rank (P1661), is linked to particular domains, rather than to the service(s) hosted on those domains.
  • Domains and subdomains do not necessarily have one-to-one relationships with services. For example, both Google Images and Google Maps are currently served through, and Google has a large number of international domain names through which those services are also served. It would be difficult to map Alexa rank (P1661) for these services, since either (1) all the data would have to be linked to one service or one company, or (2) all of the data would have to be duplicated across all of the items.
  • Domain names may have their own properties: their owners, when they were first registered, their HTTPS information, and so on, which would be difficult to model without items for those domain names.
  • Services can change domain names (e.g. (Q5614018)/The Guardian (Q11148), Wikia (Q17459)). This can be reflected using official website (P856), but it makes it much more difficult to associate data with the domains themselves.
  • Some subdomains (e.g. for Tmall (Q2829108) and Tumblr (Q384060)) also have their own Alexa data. It could be appropriate to create items for them.

Jc86035 (talk) 10:12, 17 March 2019 (UTC)

Missing edit buttons[edit]

Is anyone else having interface issues? Jc86035 (talk) 12:02, 17 March 2019 (UTC)

I did and hope I fixed it. There was an error in MediaWiki:Gadget-SimpleTransliterate.js. Matěj Suchánek (talk) 12:11, 17 March 2019 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Matěj Suchánek (talk) 18:45, 19 March 2019 (UTC)

MESH ID should be split?[edit]

--Vladimir Alexiev (talk) 13:43, 17 March 2019 (UTC)

I agree with this, basically. But when it comes to "absolutely no need to scrape it", then I disagree. I can see use cases for having the information on Wikidata (which of course motivates the proposal, filed under "authority control" even though it is not directly about an identifier). One that I didn't mention on the proposal page relates to the WikiJournal, and some good kind of "hovercard" (see mw:Page Previews) for it involving MeSH. Charles Matthews (talk) 06:48, 18 March 2019 (UTC)

New tool: QuickCategories[edit]

Hi folks! I want to announce a new tool I’ve been working on: QuickCategories (documentation), a tool to quickly add or remove categories from pages. It’s not especially useful for Wikidata directly, but it’s kind of Wikidata-adjacent, so I still want to announce it here :D

I assume most people here are familiar with QuickStatements. Harmonia Amanda suggested that something similar for categories instead of statements would be useful, and so that’s what I built. You specify the page to edit and the categories to add or remove in a big text box:

Page 1|+Category:Category to add|-Category:Category to remove
Page 2|+Category:Category to add|-Category:Category to remove

Like in QuickStatements v2, you can use keyboard-friendly | or spreadsheet-copy+paste-friendly Tab characters as separators (or mix them). At the moment, there’s no support for running commands in the background (in that respect it’s more like QuickStatements v1), but I plan to add that in the future (hopefully soon).

You can also generate commands via a Wikidata query. For example, running this query lists all Members of Parliament of the United Kingdom whose Commons category is not a subcategory of commons::Category:Politicians of the United Kingdom. You can copy the last two columns of the results (in Firefox, hold down Ctrl while dragging the mouse across them) and paste them directly into the tool.

The tool supports all Wikimedia wikis, but it’s probably especially useful on Commons, that’s why for now I’m only announcing it there and here. Feel free to copy or crosslink the announcement on the village pump (equivalent) of other wikis you think might be interested, or let me know if you think I should do it. If you have any questions, please contact me on the tool’s talk page, preferably with a {{Ping}} (I don’t check my watchlist on Meta that often). --Lucas Werkmeister (talk) 21:57, 17 March 2019 (UTC)

Request for protect:Q179294[edit]

This page is being vandalized by User:Jesamsex and other IPs, socket puppets of User:Unypoly for a long time, by adding a duplicate Korean version article w:ko:환자 (역사) and separate other Asian language versions.--Zhxy 519 (talk) 01:38, 18 March 2019 (UTC)

Wikidata:Database reports/Complex constraint violations/P935[edit]

Why does the second complex constraint does not yield any results? If I run the query on the talk page, I get 12000 results. 08:15, 18 March 2019 (UTC)

Washout of dam and bridge by Niobrara river[edit]

Any Nebraskans here? I tried to start making some sense of the flooding occurring from the Spencer dam blowout like my edit here U.S. Route 281 (Q2175059). How can an interstate blockage be modelled? Thx. Jane023 (talk) 09:13, 18 March 2019 (UTC)

@Jane023: Try significant event (P793) with point in time (P585). Snipre (talk) 10:53, 18 March 2019 (UTC)
Sorry, I think the best is
Snipre (talk) 11:05, 18 March 2019 (UTC)
Thanks - I ended up making a WP article for Spencer Dam (Q34942691) since it was done on dewiki through cebwiki (see? Cebuano wiki is good for these things!). I suppose the significant event on the interstate can be linked to the bridge that is gone (some memorial highway, can't find it though). What amess. Turns out the dam was scheduled to be decommissioned. Jane023 (talk) 11:58, 18 March 2019 (UTC)

How to deal with a large quantity of duplicates[edit]

There are currently two and a half sets of items for the 450 or so constituencies of the District Councils of Hong Kong (Q836365).

  • One mostly complete set (434 items) was created by's bot, Taiwan democracy common bot, without matching the items to sitelinks on the English or Chinese Wikipedias. These items currently have the label format "District Councils Constituency in [name], [district]"/"[district][name]區議員選區". (I think this is incorrect, since the labels should generally match the Wikipedia article titles.)
  • One partially complete set (284 items) has been created over time through auto-creation based on English Wikipedia pages. This overlaps with the mostly complete set (462 items + 7 pages with no item) auto-created from Chinese Wikipedia pages, but there are a number of duplicate items here as well.

In total, there are 1,036 items (not accounting for errors; e.g. w:en:Tung Chung North (constituency) appears to be about two different districts which have each had that name). Would it be possible to auto-match these, or does someone have to go through all of these manually? Jc86035 (talk) 10:03, 18 March 2019 (UTC)

I think either way it would need to be done by someone who understands Chinese, since with items like Q61057511 the only thing you've got for matching is the Chinese label. Ghouston (talk) 01:15, 19 March 2019 (UTC)
@Ghouston: (As noted on my user page, I can speak Chinese.) It would be fairly trivial to match the items, assuming that's labels are accurate. I didn't realize that QuickStatements could merge items, so that basically resolves the issue. Jc86035 (talk) 08:34, 19 March 2019 (UTC)
I've cleaned up just about all of the items (total 582 merges, leaving 431 constituency items with the correct statements). Jc86035 (talk) 16:41, 19 March 2019 (UTC)

Open Company Data Donation[edit]

As a side effect of one of my commercial projects I have curated a dataset of companies which I would like to donate. I started to build a company search engine (work in progress), but it might do more good for more people here at wikidata. Some of the columns which could be suitable for wikidata are : company name, company HQ address, founding date, number of employees(bracketed 1-50 50-200 ... 1000+ ) , gov business legal entity ID, founder names, list of references to the company in the news (not all of this data is visible in the search engine at the moment).

The data I derive from multiple sources: the homepage of the company, gov data repositories like companies house, and news written about a company and in some cases directly submitted or verified by the company owner.

I have >3million companies currently but we will probably want to filter it to only those with doubly verified information.

Rubenwolff (talk)Rubenwolff

@ChristianKl: Not sure what kind of permissions this bot would need but I went ahead and named it Rubenwolff (talk) 16:28, 18 March 2019 (UTC)

Calabi–Yau threefolds in $${\mathbb {P}}^6$$ P 6[edit]

The title of the item at first glance looks like some sort of vandalism, however Calabi–Yau threefolds in ℙ⁶ (Q59472728) is absolutely legit but our quickstatments bot is not math-friendly it seems. See the original scientific article. Is it serious enough to file an issue ? Kpjas (talk) 12:42, 18 March 2019 (UTC)

Both labels and titles are supposed to be unicode and not math-ml or wikitext formatted. It's the problem of the person who enters data to provide it in unicode and not quickstatements to do that conversion. ℙ⁶ is well supported by unicode. ChristianKl❫ 13:06, 18 March 2019 (UTC)
@ChristianKl: but mass importers rarely review what lands eventually in WD items, do they ? Kpjas (talk) 13:33, 18 March 2019 (UTC)
@Sohmen: Can you see that in future the data you enter is unicode? ChristianKl❫ 15:52, 18 March 2019 (UTC)

Wikidata weekly summary #356[edit]

Removing mandatory constraints when data quality is low?[edit]

As I just got into a little edit war at Common Database on Designated Areas ID (P4762) with @Abián: and @Sjoerddebruin: - do we now remove mandatory contraints simply because our data does not match them? In this case there are currently 2000+ items which lack the second identifier which they have, we "only" need to fix our data to make that constraint covered again. In this case, it is just one week now that @GPSLeo: added the only one identifier, so this I would consider this "work in progress" (see the history of Wikidata:Database reports/Constraint violations/P4762). And even if there really were so many violations for months/years - hiding the fact that our data is incomplete will not solve the problem! Only if they show prominently, someone will choose to work on them. 16:48, 18 March 2019 (UTC)  – The preceding unsigned comment was added by Ahoerstemeier (talk • contribs) at 17:48, 18 March 2019‎ (UTC).

Can't you just make it "non-mandatory"? I would think "mandatory" implies it needs immediate fixing (within hours). If it's not that urgent, it shouldn't be marked mandatory. ArthurPSmith (talk) 17:48, 18 March 2019 (UTC)
@Ahoerstemeier: don't edit war. A constraint that has 2172 constraint violations shouldn't be marked as mandatory. Solution is quite simple. You clean up the constraints so the number is zero and you add the mandatory constraint again. Everyone will be happy. Multichill (talk) 18:19, 18 March 2019 (UTC)
Great, put the work and blame on the messenger. So what we need is a new constraint type "could be mandatory once someone cleans up the data". 11:35, 19 March 2019 (UTC)
No, we don't. Matěj Suchánek (talk) 18:39, 19 March 2019 (UTC)

East Jerusalem[edit]

Hi, I have a problem with places located in the Israeli occupied territories, that is, the territories Israel has occupied since the 1967 war.

After 1967, Israel has unilaterally annexed part of East Jerusalem, this annexation is accepted by exactly 0 other countries. My 2 cents is that we should follow what the international community says, and therefore should not say that they are in Israel.

Comments? (It was partly discussed here Huldra (talk) 20:55, 18 March 2019 (UTC)

In these situations, we should include all the most relevant points in the data. I outlined many of the relevant issues on relating countries and territories and subdivisions at Wikidata:Project_chat/Archive/2018/10#Countries_and_their_subdivisions_and_territory. We need to be able to specify recognition, administration, claims, control, domestic status, subdivision structure source, etc. so that any relevant attribute can be queried. The difficulty is figuring out a data model. In the case of East Jerusalem, we must include the data that the area is administered, controlled, and claimed by Israel, and the data that it is not recognized internationally as being part of Israel. We need to establish a broad data model for dealing with this and the dozens of similar cases, rather than simplifying each situation to pick whichever side of the conflict is more popular among the people arguing the point at the moment. --Yair rand (talk) 04:29, 19 March 2019 (UTC)
Do not we already have enough qualifiers to represent both point of view?--Ymblanter (talk) 20:33, 19 March 2019 (UTC)
Thank you Yair rand, that was interesting. And yes, I absolutely agree; we should get a consensus for this, as it concerns all the articles in East Jerusalem (ie, quite a lot).
Actually, we are in a somewhat similar situation for the articles about the Golan heights: also occupied and annexed by Israel since 1967, while the international community still consider it a part of Syria.
To start with "the top" question, that is, what country they are in:
If (and it's a big IF) we have a country (P17) label, then we should have, say 2 answers;
first: Country A (with the subfield that this is according to the international community, with the exception of country 1, 2 and 3)
second: Country B (with the subfield that this is according to country 1, 2 and 3)
Or: We do NOT use a country (P17) label at all for these places, but only the territory claimed by (P1336) label. Then we will still need a field indicating who accepts the claim. Comments? Huldra (talk) 20:48, 19 March 2019 (UTC)

Incorrect Wikisource links[edit]

We have a number of items with links to Wikisource, for example:

which link to works about the subject of the item, not to an equivalent Author: namespace page.

How can we fix this? And how can we use an edit filter or similar to prevent recurrence? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:46, 18 March 2019 (UTC)

Fix by moving the Wikisource link to described by source (P1343)? Ghouston (talk) 22:08, 18 March 2019 (UTC)
and optionally create an article like Elliott, Stephen (Q28032076), although it seems like overkill for a one-paragraph encyclopedia article. Ghouston (talk) 22:53, 18 March 2019 (UTC)
You'd need to create the data item regardless in order to use described by source (P1343). Beleg Tâl (talk) 23:07, 18 March 2019 (UTC)
Not necessarily, Albigence Waldo Cary (Q18819010) already has such a statement that just links to Appletons' Cyclopædia of American Biography (Q12912667). Ghouston (talk) 00:03, 19 March 2019 (UTC)
Can described by source (P1343) have a URL qualifier? There's also described at URL (P973). Ghouston (talk) 00:09, 19 March 2019 (UTC)
@Ghouston: Albigence Waldo Cary (Q18819010) is set up incorrectly. The link to Wikisource is to a biographical article which will have its own publication data that need to be included. The person who is the subject of the article will not have publication data. So you cannot add a Wikisource link that way. It may seem like "overkill" to you, but unless the full information for the citation of the article is included, then citation data for the article cannot be pulled from the data item. See for example DGRBM-1870 / Oedipus (Q47507582), which contains full publication data allowing the linked article to be cited, by using the data in the corresponding data item. With all the publication data available in the data item, a Wikipedian wishing to cite the article could use a tool to generate the citation, without having to manually retype all of the data. --EncycloPetey (talk) 01:58, 19 March 2019 (UTC)
I'm not sure, isn't the described by source (P1343) on Albigence Waldo Cary (Q18819010) that sources Appletons' Cyclopædia of American Biography (Q12912667) not also a valid way of doing it? You can add another qualifier for the page number, and there doesn't seem to be any further publication data. Ghouston (talk) 02:41, 19 March 2019 (UTC)
The described by source (P1343) linking is correct, but not the link in the Wikimedia links section of the data item. There is a lot of other publication data: identity / title of the work in which the article was included, volume / pages in the volume, date of publication, and author of the article. This should all appear in a data item about the article, regardless of the article's length. Every published work gets its own data item, from Leo Tolstoy (Q7243)'s massive novel War and Peace (Q161531) to the 17-syllable Frog Poem (Q11411329) by Matsuo Bashō (Q5676). Length of the publication is irrelevant. --EncycloPetey (talk) 03:00, 19 March 2019 (UTC)
An aside: persons on Wikisource can be either in Author space (if they are authors) or in Portal space (if they are not authors), but never in mainspace (unless the person is themselves a written work). Beleg Tâl (talk) 23:07, 18 March 2019 (UTC)
This is true on most Wikisource projects, but not all. The German Wikisource puts its Author pages in the Main namespace instead of in an "Author:" namespace. E.g. s:de:Johann Wolfgang von Goethe --EncycloPetey (talk) 02:01, 19 March 2019 (UTC)
i would not say "they are incorrect" but that the ontology is not settled. subjects of encyclopedic articles are notable at wikidata, so a migration path from wikisource page to wikidata item would be nice. we need a systemic way of indicating "depicts" or "is the subject of article" statements. and a author / portal subject infobox at wikisource. Slowking4 (talk) 11:37, 19 March 2019 (UTC)
The ontology is not at issue. If we allow such links to a Wikisource biographical article from a Wikidata item for a person, then the system fails whenever we have two different articles about the same individual on the same Wikisource project. Only a single link to a particular project is possible from any given data item. So this method would say that "link to the article about the person, unless there is more than one article about the person, in which case, (pick one of them?) (do it differently somehow?)". Any proposed system that breaks that easily is untenable. --EncycloPetey (talk) 14:08, 19 March 2019 (UTC)
Sounds like Wikidata's issues with the different spaces on Commons are not a problem unique to Commons. - Jmabel (talk) 16:14, 19 March 2019 (UTC)

A minor enigma[edit]

Hello. Q60300562 is a wedding gift (Q60965053) by Pablo Picasso (Q5593) to Guillaume Apollinaire (Q133855). How can I store this? The question comes from the Bistro. Maybe, I'm not sure, we can have significant event (P793)change of ownership (Q14903979) with object has role (P3831)wedding gift (Q60965053) and donated by (P1028)Pablo Picasso (Q5593) as qualifiers but I'm then left without knowing how to store Guillaume Apollinaire (Q133855) under that very same property. Should this be split under two totally different statements? Do we need a 'transatcion type' property? Thierry Caro (talk) 03:33, 19 March 2019 (UTC)

OCLC Control number (P243)[edit]

Currently this is an instance of Wikidata property to identify books. Can this be changed to be an instance of Wikidata property for an identifier? (OCLC includes many more items than books, including all forms of published and manuscript material, realia, even service dogs.) It would be good to have this property to describe non-book items.

Q925929 Leroy P. Steele Prize[edit]

Could someone remove all the 61 awardees of this award (P166). It allows me to restructure this award in its composite parts. Thanks, GerardM (talk) 18:26, 19 March 2019 (UTC)