Documentation

FormaleSystemeInLean.Lecture1

Formalisation of lecture 1: This file covers the definition of words and languages as well as operations on words and languages and theorems about rules applying to these operations. The slides are available at https://iccl.inf.tu-dresden.de/web/Formale_Systeme_(WS2025)#BEtabid1-2 (German)

On slide 28 an alphabet is defined as a nonempty finite set of symbols. In lean it is more convenient to just use a type here instead of a set. The elements of Sigma could be anything: unicode characters, numbers, strings... The only restriction we make is assuming that, given two alphabet symbols of type Sigma, we can decide wether they are equal or not. Otherwise it would be impossible to compare words.

@[reducible, inline]
abbrev Word (Sigma : Type u) :

Words are merely lists over some alphabet Sigma.

Equations
Instances For
    instance instMulWord {Sigma : Type u} :
    Mul (Word Sigma)

    Concatenating two words u and v simpy means appending list v to list u. This typeclass instance enables us to write * as an infix operator.

    Equations
    theorem Word.mul_eq {Sigma : Type u} (u v : Word Sigma) :
    u * v = u ++ v
    theorem Word.mul_assoc {Sigma : Type u} (u v w : Word Sigma) :
    u * v * w = u * (v * w)

    Concatenation of words is associative.

    Equations
    Instances For
      Equations
      Instances For

        Lean's built-in list type already offers predicates for prefix, infix and suffix as defined in the lecture (slide 30)

        For every alphabet Sigma, there is an empty word ε. Since we defined words as Lists with elements of type Sigma, ε is just the empty list []. ε is the identity element for concatenation of words:

        theorem epsilon_prefix_infix_suffix {Sigma : Type u} (w : Word Sigma) :

        Example from slide 30 (follows from list lemmas)

        theorem append_nil {α : Type u_1} (L : List α) :
        L.append [] = L

        Auxiliary result for epsilon_concat

        theorem epsilon_concat {Sigma : Type u} (w : Word Sigma) :
        w * [] = w

        w * ε = w

        theorem concat_epsilon {Sigma : Type u} (w : Word Sigma) :
        [] * w = w

        ε * w = w

        @[reducible, inline]
        abbrev Language (Sigma : Type u) :

        A language is just a set of words.

        Equations
        Instances For
          def sigma_star {Sigma : Type u} :
          Language Sigma

          The "biggest language" Σ* contains all words over Σ

          Equations
          Instances For
            def L_empty {Sigma : Type u} :
            Language Sigma

            The empty language contains no words

            Equations
            Instances For
              def L_eps {Sigma : Type u} :
              Language Sigma

              The language containing only ε is not empty

              Equations
              Instances For
                theorem sigma_star_subset {Sigma : Type u} (L : Language Sigma) :

                Every language over Σ is a subset of Σ*

                theorem L_eps_subset {Sigma : Type u} (L : Language Sigma) :

                The empty language is a subset of any language.

                theorem L_eps_mem {Sigma : Type u} {w : Word Sigma} :

                A word w is contained in the language {ε} iff w = ε.

                instance instMulLanguage {Sigma : Type u} :
                Mul (Language Sigma)

                Concatenation of Languages

                Equations
                theorem Language.mul_assoc {Sigma : Type u} (L₁ L₂ L₃ : Language Sigma) :
                L₁ * L₂ * L₃ = L₁ * (L₂ * L₃)

                Concatenation of languages is associative:

                def Language.complement {Sigma : Type u} (L : Language Sigma) :
                Language Sigma

                Defining the complement of a set only makes sense if we know the "universe" of all elements. For languages this is the set of all words over the alphabet Sigma, sigma_star. So we can define the complement of a language as follows:

                Equations
                Instances For
                  theorem diff_via_inter {Sigma : Type u} (L₁ L₂ : Language Sigma) :
                  L₁ \ L₂ = L₁ L₂.complement

                  The difference between two languages can be expressed with intersection and complement.

                  def Language.pow {Sigma : Type u} (L : Language Sigma) :
                  NatLanguage Sigma

                  For languages we can also execute concatenation multiple times and define this via Powers.

                  Equations
                  Instances For
                    instance instNatPowLanguage {Sigma : Type u} :
                    Equations
                    def Language.kstar {Sigma : Type u} (L : Language Sigma) :
                    Language Sigma

                    Finally we define the Kleene Star and notation for it.

                    Equations
                    Instances For
                      def Language.plus {Sigma : Type u} (L : Language Sigma) :
                      Language Sigma

                      Definition of the "⁺" operator which is basically the Kleene Star without n=0.

                      Equations
                      Instances For

                        the first four equalities from slide 35 follow directly from set theory. Just as an example:

                        theorem language_inter {Sigma : Type u} (L₁ L₂ : Language Sigma) :
                        L₁ L₂ = L₂ L₁

                        for the remaining three identities refer to Set.lean.

                        Using the complement of a language, we can also prove De Morgan's laws.

                        theorem de_morgan_rule1 {Sigma : Type u} (L₁ L₂ : Language Sigma) :
                        (L₁ L₂).complement = L₁.complement L₂.complement
                        theorem de_morgan_rule2 {Sigma : Type u} (L₁ L₂ : Language Sigma) :
                        (L₁ L₂).complement = L₁.complement L₂.complement

                        note that this theorem requires classical logic.

                        theorem double_complement {Sigma : Type u} (L : Language Sigma) :

                        We can also prove that the complement of the complement of a language L is again L (which also requires classical logic).

                        theorem pow_as_concat {Sigma : Type u} {n : Nat} (L : Language Sigma) :
                        n > 0L ^ n = L * L ^ (n - 1)

                        This theorem will come in handy for many proofs. Although it might seem trivial, it does not immediately follow from the definition.

                        In some cases, it makes sense to think about the kleene star of some language L as the language containing words consisting of a list of words from L. We can prove that this is equivalent to our original definition.

                        theorem Language.mem_pow {Sigma : Type u} {n : Nat} (L : Language Sigma) (w : Word Sigma) :
                        w L ^ n (l : List (Word Sigma)), w = l.flatten l.length = n ∀ (u : Word Sigma), u lu L

                        We first show a stronger result: for any word from L^n, we can find a corresponding list of length n.

                        theorem Language.mem_kstar {Sigma : Type u} (L : Language Sigma) (w : Word Sigma) :
                        w L* (l : List (Word Sigma)), w = l.flatten ∀ (u : Word Sigma), u lu L

                        Since the kstar operation is defined via powers, we can now use the previous result and ignore the length of the list: If a word is in L* then it must be in some power of L. Then we can obtain the list from Language.mem_pow. For the other direction (when we have a list of words from L and want to show that the flattened list is contained in L*), we use the list's length as the exponent n required for membership in L* and then apply mem_pow again.

                        theorem kstar_subset {Sigma : Type u} (L : Language Sigma) (n : Nat) :
                        L ^ n L*

                        every power of a language L is a subset of L*.

                        theorem first_power {Sigma : Type u} (L : Language Sigma) :
                        L ^ 1 = L

                        Another example for something seemingly obvious that needs to be proven explicitly in order to be used in theorems.

                        theorem mul_eq_append {Sigma : Type u} (u v : Word Sigma) :
                        u * v = u ++ v
                        theorem add_exp {Sigma : Type u} (L : Language Sigma) (m n : Nat) :
                        L ^ n * L ^ m = L ^ (n + m)

                        Product rule for exponents: when concatenating powers of a language we can add the exponents as we do when multiplying numbers.

                        theorem distr_concat_union_l {Sigma : Type u} (L₁ L₂ L₃ : Language Sigma) :
                        (L₁ L₂) * L₃ = L₁ * L₃ L₂ * L₃

                        concatenation of languages is distributive over union (right side)

                        theorem distr_concat_union_r {Sigma : Type u} (L₁ L₂ L₃ : Language Sigma) :
                        L₁ * (L₂ L₃) = L₁ * L₂ L₁ * L₃

                        concatenation of languages is distributive over union (left side side)

                        theorem L_eps_mul {Sigma : Type u} (L : Language Sigma) :
                        L L_emptyL_eps * L = L

                        ! The language containing only ε is the identity element for concatenation of languages. Since concatenation is not a commutative operation, we need a proof for {ε} * L = L and for L * {ε} = L.

                        theorem mul_L_eps {Sigma : Type u} (L : Language Sigma) :
                        L L_emptyL * L_eps = L
                        theorem empty_mul {Sigma : Type u} (L : Language Sigma) :

                        The empty language ∅ is an annihilating element for concatenation. (left)

                        theorem mul_empty {Sigma : Type u} (L : Language Sigma) :

                        The empty language ∅ is an annihilating element for concatenation. (right)

                        theorem kstar_eq_plus_union_eps {Sigma : Type u} (L : Language Sigma) :

                        The kleene closure of a language is the same as applying the plus operator and adding the empty word.

                        theorem succ_pow_empty {Sigma : Type u} (n : Nat) :
                        n > 0L_empty.pow n = L_empty

                        All powers of ∅ (except ∅⁰) are ∅.

                        theorem pow_as_concat_comm {Sigma : Type u} (L : Language Sigma) (n : Nat) :
                        L * L ^ (n - 1) = L ^ (n - 1) * L

                        Using the two previous results first_power and add_exp, we can show that when writing the nth power of a language L as the concatenation of L with L^(-1) the order does not matter.

                        theorem kstar_plus {Sigma : Type u} (L : Language Sigma) :
                        L = L* * L
                        theorem kstar_eq_L_minus_eps {Sigma : Type u} [BEq Sigma] (L : Language Sigma) :
                        L* = (L \ L_eps)*

                        Removing the empty word from a language L does not change L*.