Aura's Den



Adventures in Conlanging - Electronic Cataloguing

Posted 06 December 2020

Let me start by saying this is a rewrite of an old thread I did on the fediverse a while back. So, if you follow me over there and this seems a bit familiar, that's probably why. That said...

I like conlanging. Quite a bit, actually. To date, I've worked on no fewer than 7, with a few more that never quite got past the phonology and maybe some very basic grammar bits. Most of this has been done on paper, with notebooks upon notebooks dedicated to various conlangs I've worked on at various points in time. But writing dictionaries on paper can be... inconvenient, to say the least. It can be difficult to find a given word, figure out how similar a word is to an existing word, alphabetize things, or add stuff to an entry after adding others after it. So, for this reason, I prefer to do digital cataloguing. What follows here is some of the various systems I've used and my thoughts on them.

Let me start by saying that while I will give my recommendations, I will not prescribe anything. If you like working in paper, more power to ya. Maybe you're using index cards or something. I dunno. Similarly, if you've got some electronic solution you prefer, whether it be text files, a spreadsheet, or some dictionary software I've not mentioned here, that's all well and good. None of what I'm to say is to say that you shouldn't use those. This is my thoughts on what I've used. Feel free to take 'em or leave 'em.

Of course, this isn't a comprehensive list. There are lots of systems out there that I've not used. These are just the ones I've used. So, definitely feel free to look around for other solutions, whether they be programs or file formats, especially if nothing I'm going to talk about suits your fancy.

Finally, everything I'm going to talk about here is free. I will be talking about the Abbyy Lingvo DSL format, and I know Lingvo isn't free, but the DSL format is out there for anyone to use, and is supported by free (as in beer) software like GoldenDict. So, all of this is available to all of you, at least in price, if not other regards. So, with all this said, let's talk about...

SIL Lexique Pro

So, Lexique was actually the software I started with, but I've honestly stopped using it. It's not that I don't like it, per se, but it's older software, Windows-only and I've never been able to get it running on Linux, and I've found more useful software. Honestly, though, if you're getting started with this stuff, it's very useful.

A screenshot of Lexique running on Windows XP. It's got a database for Pachli open with info on the word ielfa.
Lexique on what is now somehow a retro operating system. Where did the time go?
(Also, I messed up the example sentence. It should be "shefa malanet ke se ielfe", but I'm not gonna fix that screenshot.)

Lexique is very simple to use, it's very easy to create databases for new languages. After all, it is designed for previously uncatalogued langauges. It give you a lot of options for things to document. In the screenshot above, you can see the word "ielfa". In addition to the part of speech and definition, we have the plural form, a sample sentence with translation, and the etymology from the word. This isn't all that's available, though. It allows cataloguing grammatical stuff, like verb forms; various information, including encyclopædic information; cross-references; synonyms and antonyms; and notes by category. So, if you want separate anthropological and historical notes, it's got that. (NB: It may not have historical notes. But there are multiple note categories.) It's got a lot and yet it's still incredibly simple to use and generally a fairly basic program with advanced functionality.

Honestly, it'd forgotten why I liked it so much, but using it for that screenshot, I remember now. It is very nice to use. In downsides, it is an older, unsupported application. Additionally, its database format isn't portable. But, I know that it runs just fine on Windows 10, and it's not that the database couldn't be portable. It's all in plain text, but no one's written a converter for other formats. Honestly, it's very nice, and I would absolutely recommend it if you're looking to get started with electronic cataloguing or looking to move from an ad hoc solution to a proper solution.

Abbyy Lingvo .dsl Files

So, on Linux, I wanted a Linux-native solution, which brought me to what GoldenDict supported. The first one I tried was the Lingvo dictionary source format. Honestly, it's just a fancy text file but with the benefit of being able to be loaded by dictionary software. Don't believe it's a fancy text file? Here's an example:

          
  #NAME "Pachli Dictionary"
  #INDEX_LANGUAGE "Pachli"
  #CONTENTS_LANGUAGE "English"

  ielfa
    A local canid similar to [i]canis lupus[/i]
    [ex]Śefa malanet ke se ielfe[/ex]
          
        

See? Pretty simple. Honestly, change up the formatting of your text files, and it'll probably be good enough. The biggest problem with it, at least for me, is how limited it is. Like, if you've been using a text file, it's a gentle transition, but for me who started on Lexique? It was a bit underwhelming. Additonally, GoldenDict's support for formatting codes can be a bit iffy. Not everything will work. The formatting will still be there, it just won't render anything. All and all, if you just want to be able to load a text file in dictionary software, this is (maybe) a simple enough conversion. Otherwise, I'd say the other things here are probably a better option.

XDXF

Oh, geez. XDXF. Yikes.

The upside? It is very powerful. It has the ability to document lots of stuff. The downside? It's all XML. If you aren't comfortable with XML and/or HTML, stay the fuck away. Don't be lured in by its power. It will crush your will to live. Here's a sample:

          
<xdxf lang_from="PCH" lang_to="ENG" format="logical" revision="33">
  <meta_info>
    <title>Pachli Dictionary</title&>
    <full_title>kelstarbora-pachli</full_title>
    <publisher>Institute Astaria</publisher>
    <authors>
      <author role="lexicographer">Vulpes, Aura</author>
    </authors>
    <description>An XDXF example. Hello, readers!</description>
    <abbreviations>
      <abbr_def type="grm">
        <abbr_k>n.</abbr_k>
        <abbr_v>noun</abbr_v>
      </abbr_def>
    </abbreviations>
    <file_ver>First and only</file_ver>
    <creation_date>06-12-2020</creation_date>
  </meta_info>
  <lexicon>
    <ar>
      <k>ielfa</k>
      <def>A local canid similar to <i>canis lupus</i></def>
      <gr>Female ielfi; Plural ielfas/ielfis; Accusative ielfe; Dative ielf</gr>
      <tr format="X-SAMPA">iel_0f@</tr>
      <dtrn>wolf</dtrn>
      <ex type="exm">
        <ex_orig>Śefa malanet ke se ielfe</ex_orig>
        <ex_tran>I did not kill the wolf</ex_tran>
      </ex>
      <etm>From Kirai <i>wolva</i>, from Novreni <i>wulvram</i>, from Proto-Novreni *<i>wlvramh</i>
    </ar>
  </lexicon>
</xdxf>
          
        

(Holy shit, that was difficult! Embedding XML in an HTML file is very hard!)

So, yeah. That's what an XDXF dictionary looks like. As you can tell, there's a lot there. Lots you can use if for. But, it's very difficult to work with unless you're comfortable with HTML.

There's another problem with this, a problem it shares with .dsl. That problem is...

GoldenDict

A short aside, if you will. As dictionary software goes, it's very nice. The problem is that it hates unknown languages. It'll begrudgingly display the above dictionaries, but it won't quite like it. I honestly wish it had a nice way to expand what languages it knows. Some way to tell it that, say, "PCH" is Pachli. But, if you wanna do that, you gotta compile from source, which first means figuring out where it keeps that information and what all you have to change. But, it does work, I guess.

SIL FieldWorks

So, last time I talked about FieldWorks, I called it way too complex for my needs. Since then, I've used it some, and I honestly really love it! I've not really explored outside the dictionary making, but holy shit is it good for conlang dictionary making. Why? Well...

So, SIL, who you may recognize from Lexique above, are quite a few things. Some of them not so good, some of them wonderful. One of the big things they do is work on cataloguing little known languages. In doing so, they've made a lot of software to help them with that, much of which they've released for free to the world. This is all context for the greatest feature of any of these programs: A list of words to record.

Did you get that? A built-in list of words to record. If you conlang, you might be familiar with stuff like the Swadesh List. Or perhaps you've found a word list elsewhere that you use for what words to work on. Whatever it is you've got, FieldWorks has got something similar. There is a mode entitled "Collect Words". The intention is that a linguist in some area can set up a table with their laptop, open it up, and ask locals about words in their native language. Not only does it have basic words (e.g. "how do you say 'dog'"), but general phrasings (such as having "the heavens and earth" for universe). But this gives a conlanger an easy way to just start making words for their language. It's brilliant.

(By the way, the other thing you might know them for? Fonts. They're the ones behind Charis SIL, Doulos SIL, and Andika, as well as the authors of the SIL Open Font License. They're also behind the Graphite font rendering system. They're also, unfortunately, a Christian Missionary organization first and foremost, even if their efforts in linguistics are quite admirable. Take that as you will.)

The FieldWorks collect words mode showing the page for Semantic Domain 3.1.2.2, Notice.
It's also really nice in that it separates all these words into semantic domains, allowing you to focus in on specific topics you might need at the moment.

Beyond this, it's just a really powerful way to catalogue your language. Not everyone is going to need this level of complexity, but if you'd like a more complete way to document your conlang, it definitely delivers.

Fieldworks on Windows 7 with the lexicon edit mode open. It's got the stem ielf open
The worst part is that this is quickly also becoming a retro operating system.

It's got a lot it can do. In addition to the info it records about words, it also has the ability to log texts and translations and to document the various grammatical aspects of the language. You definitely want to go into it with a grammar sketch beforehand, but it can definitely help flesh out a grammar into something more natural.

Honestly, though, I said last time that it's way too complex for cataloguing alone, and, honestly, I kinda feel the same way now. There are plenty of word lists out there to work off of, and Lexique is much simpler if you just want a dictionary. If you want to play with the features it has for cataloguing texts and grammars, I'd definitely recommend it. If you just want a dictionary, maybe stick to something else. Though, to be fair, this has an Ubuntu version, so, unlike Lexique, you could run it natively, which might be something else to consider.

So, yeah. That's the conlanging software I've worked with. Honestly, my recommendations haven't changed that much. I still highly recommend Lexique, if it's not still supported. I would actually recommend FieldWorks this time around, even just for cataloguing. .dsl files are still really good if you want something simple that you can load in a dictionary program. And, I'd honestly recommend against XDXF this time around. Sure, it's powerful, but so are Lexique and FieldWorks. Unless you really need something easily portable, I'd just use one of those two.

Keywords: Adventures in Conlanging, Language, Software


Comments

None so far!