Documentation

FormaleSystemeInLean.Lecture1

Formalisation of lecture 1: This file covers the definition of words and languages as well as operations on words and languages and theorems about rules applying to these operations. The slides are available at https://iccl.inf.tu-dresden.de/web/Formale_Systeme_(WS2025)#BEtabid1-2 (German)

On slide 28 an alphabet is defined as a nonempty finite set of symbols. In lean it is more convenient to just use a type here instead of a set. The elements of Sigma could be anything: unicode characters, numbers, strings... The only restriction we make is assuming that, given two alphabet symbols of type Sigma, we can decide wether they are equal or not. Otherwise it would be impossible to compare words.

@[reducible, inline]

abbrev Word (Sigma : Type u) :

Words are merely lists over some alphabet Sigma.

Equations

Word Sigma = List Sigma

Instances For

instance instMulWord {Sigma : Type u} :

Mul (Word Sigma)

Concatenating two words u and v simpy means appending list v to list u. This typeclass instance enables us to write * as an infix operator.

Equations

instMulWord = { mul := fun (u v : Word Sigma) => List.append u v }

theorem Word.mul_eq {Sigma : Type u} (u v : Word Sigma) :

u * v = u ++ v

theorem Word.mul_assoc {Sigma : Type u} (u v w : Word Sigma) :

u * v * w = u * (v * w)

Concatenation of words is associative.

def some_word :

Equations

some_word = ['S', 't', 'a', 'u', 'b', 'e', 'c', 'k', 'e', 'n']

Instances For

def another_word :

Equations

another_word = ['A', 'l', 't', 'b', 'a', 'u', 'c', 'h', 'a', 'r', 'm', 'e']

Instances For

Lean's built-in list type already offers predicates for prefix, infix and suffix as defined in the lecture (slide 30)

For every alphabet Sigma, there is an empty word ε. Since we defined words as Lists with elements of type Sigma, ε is just the empty list []. ε is the identity element for concatenation of words:

theorem epsilon_prefix_infix_suffix {Sigma : Type u} (w : Word Sigma) :

[] <+: w ∧ [] <:+: w ∧ [] <:+ w

Example from slide 30 (follows from list lemmas)

theorem append_nil {α : Type u_1} (L : List α) :

L.append [] = L

Auxiliary result for epsilon_concat

theorem epsilon_concat {Sigma : Type u} (w : Word Sigma) :

w * [] = w

w * ε = w

theorem concat_epsilon {Sigma : Type u} (w : Word Sigma) :

[] * w = w

ε * w = w

@[reducible, inline]

abbrev Language (Sigma : Type u) :

A language is just a set of words.

Equations

Language Sigma = Set (Word Sigma)

Instances For

def sigma_star {Sigma : Type u} :

The "biggest language" Σ* contains all words over Σ

Equations

sigma_star x✝ = True

Instances For

def L_empty {Sigma : Type u} :

The empty language contains no words

Equations

L_empty x✝ = False

Instances For

def L_eps {Sigma : Type u} :

The language containing only ε is not empty

Equations

L_eps w = (w = [])

Instances For

theorem sigma_star_subset {Sigma : Type u} (L : Language Sigma) :

L ⊆ sigma_star

Every language over Σ is a subset of Σ*

theorem L_eps_subset {Sigma : Type u} (L : Language Sigma) :

The empty language is a subset of any language.

theorem L_eps_mem {Sigma : Type u} {w : Word Sigma} :

w ∈ L_eps ↔ w = []

A word w is contained in the language {ε} iff w = ε.

instance instMulLanguage {Sigma : Type u} :

Mul (Language Sigma)

Concatenation of Languages

Equations

instMulLanguage = { mul := fun (L1 L2 : Language Sigma) (w : Word Sigma) => ∃ (u : Word Sigma), u ∈ L1 ∧ ∃ (v : Word Sigma), v ∈ L2 ∧ w = u * v }

theorem Language.mul_assoc {Sigma : Type u} (L₁ L₂ L₃ : Language Sigma) :

L₁ * L₂ * L₃ = L₁ * (L₂ * L₃)

Concatenation of languages is associative:

def Language.complement {Sigma : Type u} (L : Language Sigma) :

Defining the complement of a set only makes sense if we know the "universe" of all elements. For languages this is the set of all words over the alphabet Sigma, sigma_star. So we can define the complement of a language as follows:

Equations

L.complement = sigma_star \ L

Instances For

theorem diff_via_inter {Sigma : Type u} (L₁ L₂ : Language Sigma) :

L₁ \ L₂ = L₁ ∩ L₂.complement

The difference between two languages can be expressed with intersection and complement.

def Language.pow {Sigma : Type u} (L : Language Sigma) :

Nat → Language Sigma

For languages we can also execute concatenation multiple times and define this via Powers.

Equations

L.pow Nat.zero = fun (w : Word Sigma) => w = []
L.pow n.succ = L * L.pow n

Instances For

instance instNatPowLanguage {Sigma : Type u} :

NatPow (Language Sigma)

Equations

instNatPowLanguage = { pow := fun (L : Language Sigma) (n : Nat) => L.pow n }

def Language.kstar {Sigma : Type u} (L : Language Sigma) :

Finally we define the Kleene Star and notation for it.

Equations

L* w = ∃ (n : Nat ), w ∈ L ^ n

Instances For

def «term_*» :

Lean.TrailingParserDescr

Equations

«term_*» = Lean.ParserDescr.trailingNode `«term_*» 1024 1024 (Lean.ParserDescr.symbol "*")

Instances For

def Language.plus {Sigma : Type u} (L : Language Sigma) :

Definition of the "⁺" operator which is basically the Kleene Star without n=0.

Equations

L⁺ w = ∃ (n : Nat ), n > 0 ∧ w ∈ L ^ n

Instances For

def «term_⁺» :

Lean.TrailingParserDescr

Equations

«term_⁺» = Lean.ParserDescr.trailingNode `«term_⁺» 1024 1024 (Lean.ParserDescr.symbol "⁺")

Instances For

the first four equalities from slide 35 follow directly from set theory. Just as an example:

theorem language_inter {Sigma : Type u} (L₁ L₂ : Language Sigma) :

L₁ ∩ L₂ = L₂ ∩ L₁

for the remaining three identities refer to Set.lean.

Using the complement of a language, we can also prove De Morgan's laws.

theorem de_morgan_rule1 {Sigma : Type u} (L₁ L₂ : Language Sigma) :

(L₁ ∪ L₂).complement = L₁.complement ∩ L₂.complement

theorem de_morgan_rule2 {Sigma : Type u} (L₁ L₂ : Language Sigma) :

(L₁ ∩ L₂).complement = L₁.complement ∪ L₂.complement

note that this theorem requires classical logic.

theorem double_complement {Sigma : Type u} (L : Language Sigma) :

L.complement.complement = L

We can also prove that the complement of the complement of a language L is again L (which also requires classical logic).

theorem pow_as_concat {Sigma : Type u} {n : Nat} (L : Language Sigma) :

n > 0 → L ^ n = L * L ^ (n - 1)

This theorem will come in handy for many proofs. Although it might seem trivial, it does not immediately follow from the definition.

In some cases, it makes sense to think about the kleene star of some language L as the language containing words consisting of a list of words from L. We can prove that this is equivalent to our original definition.

theorem Language.mem_pow {Sigma : Type u} {n : Nat} (L : Language Sigma) (w : Word Sigma) :

w ∈ L ^ n ↔ ∃ (l : List (Word Sigma)), w = l.flatten ∧ l.length = n ∧ ∀ (u : Word Sigma), u ∈ l → u ∈ L

We first show a stronger result: for any word from L^n, we can find a corresponding list of length n.

theorem Language.mem_kstar {Sigma : Type u} (L : Language Sigma) (w : Word Sigma) :

w ∈ L* ↔ ∃ (l : List (Word Sigma)), w = l.flatten ∧ ∀ (u : Word Sigma), u ∈ l → u ∈ L

Since the kstar operation is defined via powers, we can now use the previous result and ignore the length of the list: If a word is in L* then it must be in some power of L. Then we can obtain the list from Language.mem_pow. For the other direction (when we have a list of words from L and want to show that the flattened list is contained in L*), we use the list's length as the exponent n required for membership in L* and then apply mem_pow again.

theorem kstar_subset {Sigma : Type u} (L : Language Sigma) (n : Nat) :

L ^ n ⊆ L*

every power of a language L is a subset of L*.

theorem first_power {Sigma : Type u} (L : Language Sigma) :

L ^ 1 = L

Another example for something seemingly obvious that needs to be proven explicitly in order to be used in theorems.

theorem mul_eq_append {Sigma : Type u} (u v : Word Sigma) :

u * v = u ++ v

theorem add_exp {Sigma : Type u} (L : Language Sigma) (m n : Nat) :

L ^ n * L ^ m = L ^ (n + m)

Product rule for exponents: when concatenating powers of a language we can add the exponents as we do when multiplying numbers.

theorem distr_concat_union_l {Sigma : Type u} (L₁ L₂ L₃ : Language Sigma) :

(L₁ ∪ L₂) * L₃ = L₁ * L₃ ∪ L₂ * L₃

concatenation of languages is distributive over union (right side)

theorem distr_concat_union_r {Sigma : Type u} (L₁ L₂ L₃ : Language Sigma) :

L₁ * (L₂ ∪ L₃) = L₁ * L₂ ∪ L₁ * L₃

concatenation of languages is distributive over union (left side side)

theorem L_eps_mul {Sigma : Type u} (L : Language Sigma) :

L ≠ L_empty → L_eps * L = L

! The language containing only ε is the identity element for concatenation of languages. Since concatenation is not a commutative operation, we need a proof for {ε} * L = L and for L * {ε} = L.

theorem mul_L_eps {Sigma : Type u} (L : Language Sigma) :

L ≠ L_empty → L * L_eps = L

theorem empty_mul {Sigma : Type u} (L : Language Sigma) :

L_empty * L = L_empty

The empty language ∅ is an annihilating element for concatenation. (left)

theorem mul_empty {Sigma : Type u} (L : Language Sigma) :

L * L_empty = L_empty

The empty language ∅ is an annihilating element for concatenation. (right)

theorem kstar_eq_plus_union_eps {Sigma : Type u} (L : Language Sigma) :

L* = L⁺ ∪ L_eps

The kleene closure of a language is the same as applying the plus operator and adding the empty word.

theorem succ_pow_empty {Sigma : Type u} (n : Nat) :

n > 0 → L_empty.pow n = L_empty

All powers of ∅ (except ∅⁰) are ∅.

theorem pow_as_concat_comm {Sigma : Type u} (L : Language Sigma) (n : Nat) :

L * L ^ (n - 1) = L ^ (n - 1) * L

Using the two previous results first_power and add_exp, we can show that when writing the nth power of a language L as the concatenation of L with L^(-1) the order does not matter.

theorem kstar_plus {Sigma : Type u} (L : Language Sigma) :

L⁺ = L* * L

theorem kstar_eq_L_minus_eps {Sigma : Type u} [BEq Sigma] (L : Language Sigma) :

L* = (L \ L_eps)*

Removing the empty word from a language L does not change L*.