back

a verb is a collection of conjugations

2022-12-14

3 minute read

I had a good chat with a talent manager hiring for a startup yesterday, she put me through for a meeting with the CTO which I've booked in for tomorrow. I'm nervous, and I'll spend a fair chunk of today preparing. However, I don't want to talk about this so much in today's entry.

Today I want to talk about data modelling

and what I've learnt about it over the last day.

I'm still working on conjugação, and I found myself wrestling with the data model I had for the verbs. I don't think the way data I've got groups conjugations is very good.

Here is a table showing verb conjugations in portuguese, from wikipedia table showing portuguese verb conjugations

The data I have currently models a conjugation like this:

// indicative present singular 1st person
{
    "form": "s1",
    "group": "indicative/present",
    "group_sort": 5,
    "sort": 0,
    "value": "faço"
}

Where form contains both the grammatical number and person, and group contains the mood, form, and tense. This isn't great because it combines different features of the verb into single properties, and requires a lot of string splitting and testing to determine which conjugation it actually relates to.

I spent a while trying to build my own model of a verb, and got stuck thinking about it as an object that contains categories to be drilled down into (kind of as in the table above).

Here's a screenshot where I was experimenting with accessing different conjugations screenshot of commented out code, a variety of imagined ways to access conjugations and for posterity and accesibility, here's the text.

{/* <p>{JSON.stringify(conjugations)}</p> */}
{/* <p>{verb.pastParticiple().combinedNotation()</p> */}
{/* <p>{verb.indicative().present().singular()[0]</p> */}
{/* <p>{verb.indicative.present.singular[0]</p> */}
{/* <p>{verb.firstPerson.singular.present.indicative</p> */}
{/* verb[mood][tense][number][person] */}
{/* verb.moods.indicative.present.singular[tense][number][person] */}
{/* verb.forms.pastParticiple */}

Each of these felt clunky, frustrating to use in markup and difficult to model elegantly. I finally had a break through when I stop thinking of a verb as an object with nested properties ending in conjugations, and instead thought about it as a group of conjugations with different properties. Here's what emerged:

type verbType = "FINITE" | "NON-FINITE"
type grammaticalPerson  = "FIRST" | "SECOND" | "THIRD"
type grammaticalNumber = "SINGULAR" | "PLURAL"
type mood = "INDICATIVE" | "SUBJUNCTIVE" | "IMPERATIVE" | "CONDITIONAL"
type tense = "PRESENT" | "PRETERITE" | "PLUPERFECT" | "IMPERFECT" | "FUTURE"
type form = "GERUND" | "INFINITIVE" | "PAST-PARTICIPLE" | "PERSONAL-INFINITIVE"

type conjugation = {
    value: string,
    type: verbType
    form?: form
    mood?: mood
    person?: grammaticalPerson
    number?: grammaticalNumber
    tense?: tense
}

There may be more granular ways to model this, but I think this is a strong start. A conjugation always has a value and always has a type. The other properties are all optional. Non-finite verbs don't have moods and tenses and finite verbs don't have forms. A more complex model might split these relationships up further, or provide better binding eg. "finite verbs MUST have a mood", but for my use case that's not necessary.

To access subsets I've come up with a helper function that I think is elegant enough (though could probably do with some perfomance optimisation in good time.)

function query(...args: (verbType | grammaticalNumber | grammaticalPerson | mood | tense | form)[]) {
    let filters = args;
    let matches = filters.reduce((cc, filter) => {
        cc = cc.filter(conjugation => hasMatch(conjugation, filter))
        return cc
    }, conjugations)
    return matches
}

Because each property has a finite set of possible values, and doesn't share any values with other properties, I can filter the array where any property matches any argument and return an array of verbs that fall within the center of that ven diagram.

// returns an array of all indicative present tense conjugations. 
verb.query("INDICATIVE", "PRESENT") 
// returns an array with a single conjugation, the only gerund of the verb. 
verb.query("GERUND") 
// returns all first person singular conjugations of the subjunctive mood.
verb.query("SUBJUNCTIVE","SINGULAR","FIRST") 

This is flexible, easy to to type and intuitive to read. Typescript provides hints for all the values, so it's easy to type even though PERSONAL-INFINITIVE is long and awkward.

I'm pretty happy with this progress. I learnt a lot about portuguese verbs, and I got good experience with data-modeling and understanding how my mental model affects the structure of my solutions.

listening to

Argentine National Anthem in Qatar !Coronados de gloria vivamos o juremos con gloria morir!
Crowned with glory we live, or we swear with glory to die

Questions