[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Lojban gismu etymologies - anyone interested?
- To: John Cowan <cowan@SNARK.THYRSUS.COM>,       Eric Raymond <eric@SNARK.THYRSUS.COM>,       Eric Tiedemann <est@SNARK.THYRSUS.COM>
- Subject: Lojban gismu etymologies - anyone interested?
- From: Logical Language Group <cbmvax!uunet!GREBYN.COM!pucc.PRINCETON.EDU!lojbab>
- Reply-To: Logical Language Group <cbmvax!uunet!GREBYN.COM!pucc.PRINCETON.EDU!lojbab>
- Sender: Lojban list <cbmvax!uunet!CUVMA.BITNET!pucc.PRINCETON.EDU!LOJBAN>
I have just finished the 2nd and nastiest step in building a record of the
etymologies of the Lojban gismu.  THis consisted of going through some
50 megabytes of output files to determine the runs that actually generated
the words, and the source data, in Lojbanized form, for each of the runs.
The resulting file is the first one that is really usable for tracking
etymologies.  Its main limitation is that the 6 source language entries for
each word are in Lojbanized form, and you would therefore probably need to
know the language in question to backtrack and figure out the actual
Chinese or Russian word used, and you also need to recognize some (probably
fairly obvious) conventions we used in Lojbanizing, like dropping of some
declension suffixes to get the important part of the root.
I've worked about 2 years off and on to get this far.  The last step, getting
actual source words into the file, will probably be several more years unless
a local volunteer takes it on, since you have to go into the one-of-a-kind
raw data notebook and hand enter all the words.
On the other hand, I want to make the new etymology list available if anyone
thinks they can use it.  On disk, the step 1 file is 180K, consisting of
all the gismu that were selected, as well as rejected choices due to word
conflicts, lack o rafsi, and words that we dropped from the gismu list after
doing the data runs.  The step 2 file with the etymology for each of the
words that was actually chosen is 280K.  (Both files compress significantly.)
Printed, the files would be some 40 pages and 80 pages respectively, and we'd
have to charge 10c/page or more.
{There will probably have to be some demand shown from the community if
these files are to be made available on PLS.  If not, we could still send
the data on floppies, but I can;t afford to mail these things individually
to people who request them.
So let me know if you are interested.
lojbab@grebyn.com