[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The gismu creation algorithm



>la lojbab di'e cusku
>> It is certainly a coincidence.  If you think you have a better match
>> for a word, I still have all 20 meg of gismu data runs around [...]
>
>Well, only some bytes are enough :-) I'm not saying different gismu
>would have better scores, only that the current gismu seem to have
>higher scores when letter order in not taken into account.
>
>Let me illustrate my point with one example for each source language;
>this is interesting even if only a coincidence. It seems that if there
>is an ordered match, then a longer unordered match is likely, and you
>--More--
>didn't have to code a more complex algorithm.
>
>    gismu   etymology         score w/ order   score w/o order
>    -----   ---------------   --------------   ---------------
>    jdari   Chinese 'jian'    2                3
>    fagri   English 'fair'    3                4
>    palta   Hindi   'tal'     2                3
>    canre   Spanish 'aren'    3                4
>    kabri   Russian 'kubak'   2                3
>    sumji   Arabic  'juml'    2                3

Well, in there cases, it is clear that both the Spanish and the English
examples could not be remade so as to get higher scores, since the
out of orger phoneme is the second vowels that has to be in final position.

The Arabic example is clearly coincidence since the main ety,mological
components were sum from English (probably reinforced by Spanish, I guess
without verifying) and "ji" from Chinese.  Arabic always loses against
the other languages %^( jumji would have reduced the English score to benefit
Arabic, and sumli would have reduced the Chinese score to benefit Arabic.

JCB observed a long time ago that most of the gismu consisted of jamming
the English and Chinese togerther optimally, with the other languages
serving to make minor adjestments.  This is still essentially true, though
on occasion it is Chinese and Hindi, especially when the English is not
reinforced by a Spanish or Russian near cognate.

lojbab