Description:
Cover Automata for Finite Languages
MichaºlCadilhac
Technical Report no 0504, June 2005
revision 681
Abstract. Although re gular languages combined with nite automata are widely used and studied, man y applications
only use nite languages. Cover automata were introduced in Câmpeanuetal. ( 2001 ) as anecientwayto
represent such languages.
The bold concept is to have an automaton that recognizes not only the given language but also words longer
than any word in it. The construction of a minimal deterministic cov er automaton of a nite language results in an
automaton with fewer states than the minimal deterministic automaton that recognizes strictly the language.
In this technical report, the theory of cover automata is presented, then we focus on the algorithms that computea
minimal deterministiccov er automaton.
RÃsumÃ. Bienqueleslangagesrationnelsvusautrav ers des automatesnissoienttrsutilisÃs, beaucoupd'applications
n'utilisentaunalqueleslangagesnis. Lesautomatesdecouvertureintroduitdans Câmpeanuetal. ( 2001 )
sontunefaÃonecacedereprÃsenterceslangages.
LebutestdecrÃerunautomatequireconnaÃtunlangageplusgrandqueceluid'origine, maisquipermet dele
retrouverparseul test de la longueurdumot ÃvaluÃ. L'automate minimal construitâ¡partirdecelui-ciauramoins
d'Ãtatquel'automate minimal de l'automatequireconnaÃtstrictementlelangage.
Danscerapport technique, lathÃoriedesautomatesdecouvertureseraprÃsentÃe, puisilyseradÃtaillà les algo-
rithmesquipermettentlecalcul del'automate minimal decouverture.
Keywords
Automata, Cover Automata, Minimization, Finite Languages.
Laboratoirede Recherche et DÃveloppementdel'Epita
14-16, rue Voltaireñ F-94276 Le Kremlin-BicÅtrecedexñ France
TÃl. +33153145947 ñFax. +33153145922
michael.cadilhac@lrde.epita.fr ñ http: //www.lrde.epita.fr/òcadilh_m
2
Copying this document
Copyright c 2005 LRDE.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free
Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with
the Invariant Sections being justìCopyingthisdocumentî, no Front-Cover Texts, and no Back-Cover
Texts.
A copy of the license is provided in the le COPYING.DOC.
Contents
1
Basics
5
1.1 Denitionsand notations .................................. 5
1.2 Isomorphism and minimal automata ............................ 6
1.2.1 Preliminaries .................................... 6
1.2.2 Onthewayto minimization ............................ 7
1.2.3 Minimization algorithms .............................. 8
2
Cover automata
12
2.1 Basicidea .......................................... 12
2.2 Onthesimilarity of states .................................. 13
2.3 MinimalDFCA ....................................... 14
3
Cover minimization
16
3.1 AO ( n2 ) algorithm ...................................... 16
3.1.1 Denitions ..................................... 16
3.1.2 Algorithm ...................................... 17
3.1.3 Example ...................................... 18
3.2 AO ( n log ( n )) algorithm ................................... 19
3.2.1 Algorithm ...................................... 19
3.2.2 Example ...................................... 20
4
Implementation in the V aucanson framework
21
4.1 Implementations ...................................... 21
4.1.1 TheO ( n2 ) algorithm ................................ 21
4.1.2 TheO ( n log ( n )) algorithm .............................. 23
4.2 Performances ........................................ 25
Bibliography
27
Tableofgures
28
Index
29
Introduction
Finite languages are perhaps the most often used but the least studied family of languages in the formal
language family hierarchy. In the past century, all the works made in this eld were pieces of bigger works
that did not focus on it. Only recently, several aspects of nite languages, such as the state complexity
and decompositions, have been studied (see for example Campeanuetal. ( 1999 ), Yu ( 1999 )).
However, the wide use of nite languages1 has conduced the researcheortto focus on away to
represent nite languageseciently . This eorthasledto the creation ofaspeciceldin the automata
theory: cover automata .
First introduced by Câmpeanuetal. ( 2001 ), cover automata are an alternative representation of nite
languages. The aim was to create an automaton that can, in someway or another, eectively represents
a nite language but with fewer states than the corresponding automaton for this language. Byìfewer
statesî, we are referring to the number of states of the minimal deterministic version of each automaton.
Roughly speaking, if Lis a nite language and'the length of the longest word(s) in L, a cover
automaton for the language Lacceptsallwords in Land possibly additional words of length greater than
'. We will see in the sequel how this construction can be used to:
express nite languages as powerfully as withaclassical automaton,
reduce the size of the automaton.
Intherstpart, some basics of automata theory are introduced, together with a reminder on classical
automata minimization. Then, in the second part, cover automata theory is described, and the main
properties of these automata are presented. In the third part, the class of algorithms that computea
minimal deterministic nite cover automaton of a language is studied. Last, thenal part presents the
implementations and performances of these algorithms in the Vaucanson2 framework.
1
For instance in lexical analysis or in user interface translations (see Wood and Yu ( 1998 ))
2
Vaucanson: http: //vaucanson.lrde.epita.fr
Chapter1
Basics
This chapter presents some basics of automata theory we need in this report. First, some classical notions
are presented, then the class of algorithms that minimize an automaton. The followingdenitions are
relative to the use we will have of them ( i.e. they are not absolute). Unlessspecied otherwise, they are
taken from Sakarovitch ( 2003 ).
This part being only a reminder, the theorem and propositions listed here will not be demonstrated.
1.1 Denitionsand notations
Notation1 (Alphabet and words) . If is a set of letter, We will write for the set of words based on
the alphabet . The empty word will be denoted " . Additionally, we will note>'the set of words longer
than ' (respectively,<'the set of words shorter than ' ).
Denition1 (Language and Finite language) . A language Lonthealphabetisa (possibly empty) set
of words, that is to say, L
.
A language Lis nite if Card (L) is bounded.
Denition2 (Automaton) . An automaton Ais speciedbythefollo wing elements:
Anon-empty set Q called the set of states of A,
Anon-empty nite set called the alphabet of A,
Two subsets I and F of Q ; I being the set of initial states and F being the set ofnalones,
Afunction': Q
! 2Qcalledthe transition function . Given a state q and aword w , ' ( q ; w )
represents the set of states which have incoming transitions from q labeled by w .
Denition3 (Language of an automaton) . The language recognized by the automaton Ais
L (A) =f w 2
j9 q
i
2 I ; q
f
2 F ; q
i
w
{ q
f
g
Denition4 (Finite automaton) . We will say that an automaton is nite if the set Q is nite.
Denition5 (Real-time automaton) . An automaton is real-time if the transition function 'isdenedas
': Q
! 2Q. In other words, all the transitions are labeled by a single letter.
Notation2 (Automaton) . With the notations of Denition 2 , an automaton A will be written as a quin-
tupletA=h Q ; ; '; I ; F i . We will say that A is an automaton over .
1.2 Isomorphism and minimal automata
6
The sequel of this report will only consider real-time automata.
Denition6 (Completeness) . An automaton is complete if for all state q and for all letter a in the alphabet,
there is an outgoing transition from q labeled by a . An automaton can be completed: a sink state is added
on which every missing transitions will go to.
Denition7 (Trim) . An automaton is trimmed if all its states are reachable from the initial state and can
reach anal state. Trimming usually remove the sink state.
Denition8 (Deterministic Finite Automaton) . A Deterministic Finite Automaton (DFA) is an automaton
such that 8 q 2 Q ; 8 l 2; j ( q ; l ) j 1 and j I j=1, i.e. for all states and all letters, it exists at most one
outgoing transition labeled by this letter.
Denition9 (Non-deterministic Finite Automaton) . A Non-deterministic Finite Automaton (NFA) is the
general casein the determinism property: determinism and non-determinism are not mutually exclusive
concepts, a deterministic automaton is non-deterministic.
The reminder of this document will only consider deterministic automata.
Denition 10 (Evaluation) . An evaluation of aword w = ( w
1
; : : : ; w
n
) on an automaton Ais dened
recursively on the letters by:
Let Q
k
be the states reached by the word w0 = ( w
1
; : : : ; w
k
),
Q
k +1
is the states reached with the letter w
k +1
from the states of Q
k
,
Then the result of the evaluation is true if Q
n
, ; , false otherwise.
Notation3 (Evaluation) . Thee valuation of a word won an automaton A will be written eval ( w ; A) . If
eval
( w ; A) is true , then we say that wis recognized by A , or that wis in the language L (A) described by
A.
1.2 Isomorphism and minimal automata
1.2.1 Preliminaries
First, the reader has to be convicted that there is not only one automaton fora specied language. For
instance, these two automata recognize the same words:
p
a
q
r
a
a
Figure1.1: Automata forìa*î
The question that arises knowing that is: ìDoesanautomatonhavea canonical representation?î
It is the case with deterministic automata: the minimal deterministic automaton is unique fora given
language. For example, the minimal DFA for ìa*îistheoneonthelefton Figure 1.1 .
7
Basics
The followingdenitions and theorems will be needed to understand the algorithms that compute the
minimal automaton ofa DFA:
Denition 11 (Isomorphism of automata) . Two automata are isomorphic if they are the same without
regard to the name of the states.
Denition 12 (Minimal automaton) . An automaton Ais minimal if for all automaton Bsuchthat L (A) =
L (B), Card (A) Card (B) with Card () giving the number of states.
Theorem 1 (Unicityofthe minimal automaton) . The minimal automaton of a language L is unique
through automata isomorphism.
Denition 13 (Equivalency of states) . Let A=h Q ; ; '; I ; F i, ( p ; q ) 2 Q2 .
Suppose we reached p and q with two words w
1
and w
2
.
We say that:
w
1
is equivalent to w
2
and note w
1
w
2
if8 z 2
; w
1
z 2 L , w
2
z 2 L ,
p is equivalent to q and note p
q if w
1
w
2
.
The relation is an equivalence relation.
Intuitively, p
q if p and q have the same future .
The following theorem establishes the link between equivalency and minimal automata:
Theorem 2 ( Myhill ( 1957 )) . ADFA is minimal if and only if 8 ( p ; q ) 2 Q2 ; p , q , p . q.
1.2.2 Onthewayto minimization
From the Theorem 2 , we could easily deduce a simple algorithm that makes the minimization. Given an
oracle that would say if two states are equivalent, we could merge equivalent states (as they have the same
future ) to obtain an automaton in which no states are equivalent.
For instance, in the following automaton, q and r are equivalent:
p
q
r
a
b
s
c
c
Figure1.2: Automaton forì(a+b)cî
1.2 Isomorphism and minimal automata
8
Thus, to obtain the minimal automaton forì(a+b)cî, q and r should be merged:
p
f q ; r g
s
c
a
b
Figure1.3: Minimal automaton forì(a+b)cî
1.2.3 Minimization algorithms
In this section, we will work on the automaton A=h Q ; ; '; I ; F i.
Minimization algorithms aim atndingthe equivalences between the states. They usually work by par-
titioningtheset of states in equivalence classes, that is to say in classes in which states are all equivalent.
Therstgrossdi vision that could be made is Q = F [ Q n F . Indeed, we know that the elements of F
could not be equivalent to the elements of Q n F : only a state in F can lead to anal state on the input".
It does not mean, however, that the states in F are all equivalent.
The algorithms begin with thisrst partitioning, then make successiverenements to reach axed
point. The following theorem assures the equivalence between this task and the minimization.
Theorem 3 ( Myhill ( 1957 )) . Fora deterministic nite automaton M , the minimum number of states in
any equivalent deterministic nite automaton is the same as the number of equivalence classes of M 's
states.
Moor e'salgorithm
The algorithm of Moore ( 1956 ) is based on the following fact:
Fact 1. All equivalent states go to equivalent states under all inputs.
The time complexity of the algorithm is in O ( n2 ), with n = Card (A). Its principle is quite simple:
1
start with two groups F and Q n F
do {
3
for every group{
for every state in the group{
5
find which groupstheinputslead to
}
7
if there are differences{
partitionthegroupintosetscontainingstateswhichgotothe
9
same groups underthesame inputs
}
11
}
} while a partitioning has been made in the loop
9
Basics
Let us treat an example: we will use the automaton Mthatdescribes the language denoted by the
regular expressionì(aa+b) ab(bb) î. This automaton is real-time and deterministic, in other words, it
matches our requirements:
q
1
q
5
q
6
q
2
q
3
q
4
q
7
q
8
b
a
a
b
a
a
b
a
b
a+ b
a
a
b
b
b
Figure1.4: The automaton M
Our rstpartition is the split of Q regarding nal states:
A=f q
3
; q
8
g; B =f q
1
; q
2
; q
4
; q
5
; q
6
; q
7
g
Weshallnowndif the states in these groups go to the same group under inputs a and b . It could be
seen that the states of group A both go to states ingroup B under both inputs.
However, this is not the case for the group B ; the following table shows the result of applying the inputs
to these states (for instance, the input b leads from q
7
to q
8
, i.e. to A )
Instate: q
1
q
2
q
4
q
5
q
6
q
7
a leads to: B
B
B
B
B
B
b leads to: B
A
B
B
B
A
The input b helps us distinguish between two of the states ( q
2
and q
7
) and the rest of the states in the
group since it leads to group A for these two instead of group B . Thus we should split B into two groups
and we now have:
A=f q
3
; q
8
g; B =f q
1
; q
4
; q
5
; q
6
g; C =f q
2
; q
3
g
The next loop in the algorithm shows us that q
4
is not equivalent to the rest of group B with input a and
we must split again.
Continuing this process until we cannot distinguish between the states in any group by employing input
tests, we end up with the following classes:
A=f q
3
; q
8
g; B =f q
1
; q
5
; q
6
g; C =f q
2
g; D =f q
4
g; E =f q
7
g
1.2 Isomorphism and minimal automata
10
Considering the above theoreticaldenitions and results, we can say that all states in each group are
equivalent because they all go to the same groups under the inputs a and b .
The minimal automaton for L (M) is then:
B
C
A
D
E
a
a
b
b
a
a
b
b
Figure1.5: Minimal automaton for L (M)
Hopcroft'salgorithm
The minimization algorithm of Hopcroft ( 1971 ) is based on the same fact as Moore'sone, but its formu-
lationischanged:
Fact 2. On some input and for every group G , all the groups should not contains some states that are
predecessor of Gand some others that are not.
This reversed point of view (i.e. to consider predecessors) together with a trick in enqueueing in the
algorithms lead toatime complexity in O ( n : log ( n )). Hopcroft'salgorithmcould be written as follows:
initialize
=f F ; Q n F g,
2
put F ina"To treat"queue T ,
while Tis not empty{
4
remove as et S of T ,
for each letter in
{
6
compute X : the list of states that go in S ,
Y : the list of states that don0t,
8
for each sets
in
{
if
\ X , ; and
\ Y , ; {
10
split
in
\ X and
\ Y ,
enqueue the smaller one in T .
12
}
}
14
}
}
Let us apply this algorithm to the automaton Mof Figure 1.4 . One of the rstiteration will consider
S =f q
2
; q
7
gwiththeinput letter b . In the other set, namelyf q
0
; q
1
; q
3
; q
4
; q
5
; q
6
g, the statesf q
1
; q
6
gare
predecessors of S but the others are not. The rstsplitwill be made, and the partition will be:
=ff q
0
; q
3
; q
4
; q
5
g; f q
2
; q
7
g; f q
1
; q
6
gg
11
Basics
The last group will beenqueued, being the smaller one in the split, then the loop will continue: the
iteration with S =f q
1
; q
6
gandtheinput letter a will splitf q
0
; q
3
; q
4
; q
5
g, q
4
not being a predecessor of S .
We get:
=ff q
0
; q
3
; q
5
g; f q
2
; q
7
g; f q
1
; q
6
g; f q
4
gg
Again, the last group will beenqueuedandthe iteration with S =f 4 gandinputletter a will cause an
ultimate split off q
1
; q
6
g. We end up with the same partition as in Moore'salgorithm, that is to say:
=ff q
0
; q
3
; q
5
g; f q
2
; q
7
g; f q
1
g; f q
4
g; f q
6
gg
Chapter2
Cover automata
In this chapter, we present the theory of cover automata and detail the current results of this eld. As one
of the most important, we will show thataminimal deterministic cov er automaton fora nite language
has lessor as many states as the minimal deterministic automaton for this language.
The following is mostly taken from Câmpeanuetal. ( 2001 ).
2.1 Basicidea
A nite language has a nite number of words; some of them are the longest ones. In the use of these
languages, the length'of the longest words is often, if not always, known. An automaton fora nite
languageL, whose longest words have size', could be seen as:
A checker fora word w to be in (L[ W ) ; W
>'
,
A checker for this word w to have a size lessor equal to'.
A cover automaton supposed deterministic (DFCA) fora language Lwillnotcheckthe length of the
input. In other words, it accepts all words of Land possibly longer words than the longest ones in L. If,
what we will suppose, 'isknown1, the length check will be made before the evaluation in the automaton
and we will end up with both functionalities.
For instance, the following automaton is a cover automaton for L=f a ; b ; aa ; aaa ; bab g:
p
q
r
s
a
b
a
b
a
Figure2.1: Cover automaton for L=f a ; b ; aa ; aaa ; bab g; '=3
1
It can be computed in a time linear in jLj.
13
Cover automata
Obviously, this automaton does not recognize strictly L, for example, the wordìbabaîisaccepted
here. However, the longest words havealengthof'=3, andìbabaîislongerthanthat. Therefore, the
length check f ailing, we can say thatìbabaîisnotin L. It follows this formal denition:
Denition 14 (Cover automaton) . A deterministic nite cover automaton fora nite language L
f
of
longest words of length', isa DF AA such that L (A) \'=L
f
. We say that L (A) isa cover language
ofL
f
.
The sequel of this report will focus on the cover minimization of an automaton, that is to say the
building of a minimal deterministic cov er automaton (MDFCA) froma DFA.
2.2 Onthesimilarity of states
As we saw in Section 1.2 , the construction of a minimal automaton is based on the equivalence relation
(Denition 13 ): if p
q , p and q have the same future . In the context of cover automata, we need another
important relation:
Denition 15 (Similarity of states) . Let A=h Q ; ; '; I ; F i, ( p ; q ) 2 Q2 .
Suppose we reached p and q with, respectively, two words w
1
and w
2
.
We say that:
w
1
is similar to w
2
and note w
1
w
2
if8 z 2
such that j xz j'^j yz j'; xz 2 L , yz 2 L ,
p is similar to q and note p
q if w
1
and w
2
are the shortest paths to respectively p and q , and
w
1
w
2
.
The denedrelation is ree xive, symmetric but not transitive.
Intuitively, p
q if p and q have the same bounded future .
For instance, in Figure 2.1 , we haveìbabîìaî, ìaîìbabî but ìaî/ìbî, so p / q .
Denition 16 (Gap) . The function gap : QQ ! Ncomputesthe length of the shortest word that shows
that its two arguments are dissimilar (i.e. aw ordthatleadsto anal state from one state and that does
not from the other one). For convenience, if they are similar, the function returns'.
For instance, from the following automaton:
q
r
s
t
a
b
a
a
Figure2.2: Automaton forì(a+b)aî
2.3 MinimalDFCA
14
The gap function can be computed as follows:
gap
q
r
s
t
q
'110
r
''0
s
'0
t
'
For example, gap ( r ; q ) is 1 because the shortest word that showsa dissimilarity between r and q isìaî.
Denition 17 (Level) . For i the initial state of the automaton A=h Q ; ; '; I ; F i, we dene
8 q 2 Q ; level ( q ) =minfj w j; iw { q g
Intuitively, level ( q ) is the word that represents the shortest path to q .
2.3 MinimalDFCA
Denition 18 (Minimal DFCA) . An automaton Ais a minimal DFCA for Lifforall automata Bsuch
that Bis a cover automaton of L, Card (A) Card (B).
The main theorem in the eld of cover minimization is the equivalent of Theorem 2 but with the
relation:
Theorem 4 ( Câmpeanuetal. ( 2001 )) . ADFCA A is minimal if and only if 8 ( p ; q ) 2 Q2 ; p , q ) p / q.
A strong link can be made with the gap function: the computation of this function leads to the knowledge
of similar classes: if gap ( r ; s ) is known and valued as', then r and s are similar. Therefore:
Fact 3. An algorithm that computes the gap function would cover minimize the automaton.
It could be noticed that the relation is mor erestrictive than. In other words, p
q ) pq ; this
implies that the similar classes are larger or as large as the equivalence classes. The following fact is then
deduced:
Fact 4. The minimal DFCA for Lis smaller than the minimal DFA for L.
As a minimal DF Aw asrstdenedtobea canonical representation of a language, one can wonder ifa
minimalDFCAis still unique , this property being really useful when, for instance, comparing automata.
Alas, it is not the case, due to the fact that the similarity relation is not an equivalence relation. For
example, the following automata are minimal DFCA for the language L=f a ; ab ; ba ; aba ; abb ; baa ; bab g:
p
q
r
s
p
q
r
s
a
b
a
b
a
b
a
b
a
a+ b
Figure2.3: Two distinct minimal DFCA for L=f a ; ab ; ba ; aba ; abb ; baa ; bab g
15
Cover automata
Some questions may arise, rst, ìWhycannotweminimizethroughclassicalminimizationtheminimal
DFCA inorderto have a canonical form?î
The answer is simple: minimal DFCA are minimal DFA. But the minimal DFCAA is fora language
L
f
whereas the view of A as a minimal DF Awouldbeforthe language L (A), and these languages are
dierent (see Denition 14 ).
Another question could be ìHowmanydistinctautomatacananalgorithmthatmakesaminimization
through similarity classes yield?î
The following theorem answers this question:
Theorem 5 ( Câmpeanuand Paun ( 2002 )) . The number of minimal DFCA that can be obtained froma
given minimalDFAwithn states by merging the similar states in the given DFA is upper bounded by
k
0
!
(2 k
0
n +1)!
; withk
0
=
2
6
6
6
6
6
6
6
6
4 n
9+
p
8 n +1
8
3
7
7
7
7
7
7
7
7
Moreover, this bound is reached, i.e. for any given positive integernwecannda minimal DFA withn
states, which has the number of minimal DFCA obtained by merging similar states equal to this maximum.
This is a penalty one has to deal with when working with cover automata. Another penalty is the
following: when one considers a minimal automaton, the sink state is usually not represented. However,
a minimization algorithm should, in theory, return a complete automaton. A drawback of cover automata
is that the sink state could be merged with another state, leading to the creation of transitions thata
trimmed automaton would not have:
p
q
r
f p ; s g
q
r
a
b
a
b
s
a
b
a+ b
a+ b
a
b
Figure2.4: Merge of the sink state
Chapter3
Cover minimization
This chapter presentstwo algorithms that make cover minimization, that is to say, transform a DF AA into
a minimal DFCAforL (A). The historicallyrst algorithm presented in the original paper ( Câmpeanu
et al. ( 2001 )), with a time complexity in O ( n4 ), has a messy theoretical study; thus, we will not treat it,
as the original authors published in the same breath of therstpaper another one that detailsa O ( n2 )
algorithm.
3.1 A O ( n2 ) algorithm
This algorithm has been presented by Paunetal. ( 2001 ) and is a signicant improvement of the previous
one of Câmpeanuetal. ( 2001 ). As seen in Fact 3 , computing the gap function is away to compute the
minimalDFCA. It is proposed, as in the veryrstalgorithm, to compute eciently this function.
3.1.1 Denitions
We assume that A=h Q ; ; '; I ; F i is a DFA accepting a nite language Lwhoselongest words have
length'. Ais complete and without any useless state except the sink state.
One may refer to Paunetal. ( 2001 ) for the proofs of the following lemmas and theorems.
Denition 19 (Range) . For p ; q 2 Q2 and p , q , we dene
range
( p ; q ) ='max ( level ( p ) ; level ( q ))
Intuitively, if w
p
and w
q
are the shortest words that lead to, respectively, p and q , range ( p ; q ) is the
maximum length of aword w such thatj w
p
w j'andj w
q
w j'.
Denition 20 (State failure) . Let p ; q 2 Q and z 2
. We say that p and qfail on z if' ( p ; z ) 2 F and
' ( q ; z ) 2 Q n F or vice versa, andj z j range ( p ; q ).
Theorem 6. p / qifandonlyif the reexistsz 2
suchthatpandq fail onz.
With those denitions, gap can be re-denedasfollows:
Denition 21 (Gap) . If p / q ,
gap
( p ; q ) =minfj z jsuchthat p and q fail on z g
17
Cover minimization
Theorem 7. 1. Letdbethesink state of A . If level ( d ) >' , then 8 q 2 Q nf d g; d
q. If level ( d ) ' ,
then 8 f 2 F ; d / fand gap ( d ; f ) =0 .
2. Ifp 2 Fandq 2 Q n F nf d g then p / qand gap ( p ; q ) =0 .
Lemma1. Let p ; q 2 Q2 ; p , qandr =' ( p ; a ) , t =' ( q ; a ) , for somea 2
, then range ( p ; q )
range
( r ; t ) +1 .
The following theorems are the crux of the algorithm:
Theorem 8 ( Paunetal. ( 2001 )) . Letpandqbetwo states such that eitherp ; q 2 For p ; q 2 Q n F. Then
p / qifandonlyif the reexistsa 2
such that ' ( p ; a ) = rand ' ( q ; a ) = t ; r / t, and
gap
( r ; t ) +1
range
( p ; q )
Theorem 9 ( Paunetal. ( 2001 )) . Ifp / qsuchthatp ; q 2 For p ; q 2 Q n F, then
gap
( p ; q ) =minf gap ( r ; t ) +1 j' ( p ; a ) = rand ' ( q ; a ) = t ; fora 2
; r / t ; and gap ( r ; t ) +1
range
( p ; q ) g
3.1.2 Algorithm
The presented algorithm assumes that its input automaton is ordered , which means that the states are
numbered from 0 to n , and that the last one is the sink state.
The Theorem 9 naturally leads to the following algorithm:
1
Input: An ordered, reduced and complete DF AA=h Q ; ; '; I ; F i, with n +1 states,which accepts a nite
languageLwhose longest words havelength'
3
Output: gap ( i ; j )for each pair i ; j 2 Q and i < j :
Algorithm:
5
1. for each i 2 Q {
compute level ( i ) }
7
2. for i =0 to n 1{
gap ( i ; n ) ='}
9
if level ( n ) '{
for each i 2 F {
11
gap ( i ; n ) =0}}
3. for each pair i ; j 2 Q nf n gsuchthat i < j {
13
if i 2 F and j 2 Q n F or vice versa{
gap ( i ; j ) =0
15
} else {
gap ( i ; j ) ='
17
}
}
19
4. for i = n 2 down to 0{
for j = n down to i +1{
21
for each a 2
{
let i0 =' ( i ; a ) and j0 =' ( j ; a )
23
if i0 , j0 {
g = if ( i0 < j0 ) then gap ( i0 ; j0 ) else gap ( j0 ; i0 )
25
if g +1 range ( i ; j ) {
gap ( i ; j ) =min(gap ( i ; j ) ; g +1)
27
}}}}}
3.1 A O ( n2 ) algorithm
18
The time complexity is established as follows:
Each of Step1and Step 2 is O ( n ),
Step 3takesO ( n2 ) iterations,
Step 4hastwonested loops, each of which has O ( n ) iterations. Each inner iteration is O (jj), where
jjisaconstant.
Therefore, this algorithm that computes the gap function is in O ( n2 ).
3.1.3 Example
Let A=h Q ; ; '; i ; F i be the automaton denoted by Figure 3.1 . Clearly, L (A) =f abc ; ababc ; abababc g
and'=7.
We suppose the existence of a sink state q
9
, not represented here for the sake of clarity.
q
1
q
2
q
3
q
4
q
5
q
6
q
7
q
8
a
b
a
b
a
b
c
c
c
Figure3.1: ADFA for L (A) =f abc ; ababc ; abababc g
We follow the algorithm of Section 3.1.2 that computes the gap function. At Step 1, level ( q ) is calculated
for each q 2 Q :
State q
1
q
2
q
3
q
4
q
5
q
6
q
7
q
8
q
9
Level
0
1
2
3
4
5
6
3
1
After Step2, Step 3 and Step 4, we have gap ( q
i
; q
j
) for each 1
i
8; 2
j
9and i < j as follows:
q
2
q
3
q
4
q
5
q
6
q
7
q
8
q
9
q
1
2
1
2
1
2
1
0
3
q
2
1
7
1
7
1
0
2
q
3
1
7
1
7
0
1
q
4
1
7
1
0
2
q
5
1
7
0
1
q
6
1
0
2
q
7
0
1
q
8
0
19
Cover minimization
In the above table, the states are equal if gap ( q
i
; q
j
) =7, ell being 7. The merging of similar states
results in the follo wing minimal DFCA for L (A) (the sink state is still not represented):
f q
1
g
f q
2
; q
4
; q
6
g
f q
3
; q
5
; q
7
g
f q
8
g
a
b
c
a
Figure3.2: MDFCA for L (A) =f abc ; ababc ; abababc g
3.2 A O ( n log ( n )) algorithm
This algorithm, which has been presented by KËrner ( 2003 ), is a major and the last improvement in cover
minimization. Inspired by the algorithm of Hopcroft ( 1971 ), it is unlikely that an algorithm withasmaller
time complexity will be found as it equals the better one for classical minimization.
3.2.1 Algorithm
Basically, it uses Hopcroft'salgorithmwith regard to abounded evaluation of each states.
The algorithm can be expressed as the following to emphasize the similarities with Hopcroft'salgo-
rithm:
1
initialize
=f F ; Q n F g,
put ( F ; 0) ina"Tot re at"queue T ,
3
while Tis not empty{
remove as et ( S ; k ) of T ,
5
for each letter in
{
compute X : the list of states thatgoin S ,
7
Y : the list of statesthatdon0t,
with level ( p ) + k
'
9
for each sets
in
{
if
\ X , ; and
\ Y , ; {
11
split
in
\ X and
\ Y ,
enqueue the smaller one in T with k +1.
13
}
}
15
}
}
3.2 A O ( n log ( n )) algorithm
20
Themodications from the original algorithm are the following:
l : 2: Thenal states are enqueued together witha 0; this number represents the distance from the
states enqueuedto anal state,
l : 4: k , this distance, is removed with the set of states,
l : 8: The states are considered with regard to their level and the distance from them to anal state
( k ); it results in consideringa state if and only if the length of the word that led toit is less than'.
l : 12: The new set in enqueuedwithan incremented distance to thenal states.
As the computing of the level function is in O ( n ), the whole algorithm is still in O ( n log ( n )).
3.2.2 Example
This example considers the following automaton B=h Q ; ; '; i ; F i which recognizes the language
L (B) =f bc ; babc gwith'=4.
We suppose the existence of a sink state q
6
not represented in thegure.
q
1
q
2
q
3
q
4
q
5
b
a
b
c
c
Figure3.3: ADFA for L (B) =f bc ; babc g
One of the rst iteration will consider S =f q
5
g; k =0, with the input letter c . The set f q
1
; q
2
; q
3
; q
4
; q
6
g
will be split into f q
2
; q
4
gandf q
1
; q
3
; q
6
g, and the latter will beenqueued together with k +1=1.
Another iteration will then consider S =f q
2
; q
4
g; k =1 andtheinput letter b . The set f q
1
; q
3
; q
6
gis
split into f q
1
; q
3
gandf q
6
g, andff q
6
g; k +1=2 gwillbeenqueued.
If we now consider S =f q
6
g; k =2 andtheinput letter a , the list of states going in S is X =
f q
1
; q
3
; q
4
; q
5
; q
6
g. This X would split the set f q
2
; q
4
gif q
4
was in the actual X : indeed, level ( q
4
) + ( k =
2) =5
', so q
4
is not in X .
The algorithm will not produce anymore split, and the resulting automaton is the following:
f q
1
; q
3
g
f q
2
; q
4
g
f q
5
g
c
b
a
Figure3.4: MDFCA for L (B) =f bc ; babc g
Chapter4
Implementation in the V aucanson
framework
The two algorithms of Section 3.1 and Section 3.2 have been implemented using the Vaucansonframe-
work. In this chapter, we present those implementations, together with a study of their respect ive perfor-
mances.
4.1 Implementations
4.1.1 The O ( n2 ) algorithm
The implementation is quite straightforward, it uses only basic functionalities of Vaucanson. TheìStepîs
in comment refer to the ones of Section 3.1 .
void compute_gaps ( con st automaton_t&
a ut,
2
unsigned
l,
gaps_t&
gaps,
4
vstates_t&
states,
levels_t&
levels)
6
{
cons t alphabet_t&
alphabet=aut. series () . mon oid () . alphabet () ;
8
// Step 1.
10
for_each_initial_state (istate, aut)
compute_levels (aut, i state, 0, levels) ;
12
// Get the sink state
14
hstate_tsink=get_sink_state (aut) ;
16
// Compute Q
sink
states. clear () ;
18
for_each_state (istate, aut)
if (is tate! =sink)
20
states. push_back (is tate) ;
4.1 Implementations
22
// Step 2.
22
for_all (vstates_t, i state, states)
gaps [istate][sink]=l;
24
if (levels[sink]<=l)
26
for_each_final_state (istate, a ut)
gaps [istate][sink]=0;
28
// Step 3.
30
for_all (vstates_t, i, states)
for_all (vstates_t, j, states) {
32
if ( not (i<j) )
continue ;
34
if ( (aut. is_final (i) and not a ut. is_final (j) ) or
( not aut. is_final (i) and aut. is_final (j) ) )
36
gaps[i][j]=0;
else
38
gaps[i][j]=l;
}
40
// Now add the sink state at the end of the con ta in er.
42
states. push_back (sink) ;
44
// Step 4.
delta_tdelta;
46
vstates_t: : reverse _iteratori=states. rbegin () ;
for (++i, ++i; i! =states. rend () ; ++i) {
48
for (vstates_t: : reverse _iteratorj=states. rbegin () ;
j! =i; ++j)
50
for_each_letter (iletter, alphabet) {
delta. clear () ;
52
a ut. letter_deltac (delta, i, i letter, delta_kind: : states () ) ;
hstate_tip= (delta. begin () ) ;
54
delta. clear () ;
a ut. letter_deltac (delta, j, i letter, delta_kind: : states () ) ;
56
hstate_tjp= (delta. begin () ) ;
58
if (ip! =jp) {
unsigned g= (ip<jp) ? gaps[ip][jp]: gaps[jp][ip];
60
if (g+1<=range (i, j) ) {
gaps[i][j]=std: : min (gaps[i][i], g+1) ;
62
}}}}}
23
Implementation in the V aucanson framework
4.1.2 The O ( n log ( n )) algorithm
This implementation of the algorithm presented in Section 3.2 computes similarity states decompositions
(SSD). Only the relevant portions of code are kept here. The comments are putto help the reader know
wherein algorithm of Section 3.2 we are.
Thanks to Vaucanson, this implementation is a great improvement of the original 400 lines implemen-
tationpresented in KËrner ( 2003 ).
void compute_ssd (automaton_t&aut,
2
unsigned
l,
levels_t&
levels,
4
ssds_t&
Qs)
{
6
// Compute level (q) forallq < Q
levels. clear () ;
8
for_each_initial_state (istate, aut)
compute_levels (aut, i state, 0, levels) ;
10
// Q (0) = Q \ F; Q (1) = F
12
for_each_state (istate, aut)
if ( not a ut. is_final (istate) )
14
Qs[0]. insert (istate) ;
else
16
Qs[1]. insert (istate) ;
18
unsigned
r=2;
// ssdsindex.
ssd_queue_t
T;
// FIFOqueueforsplitting.
20
hstates_t
X;
hstates_t
Y;
22
delta_t
delta;
cons t alphabet_t&
alphabet=aut. series () . mon oid () . alphabet () ;
24
// InitializeTwith (F, 0)
26
T. push (ssd_pair_t (hstates_t () , 0) ) ;
for_all (hstates_t, i state, Qs[1])
28
T. front () . first. insert (istate) ;
30
// Main loop
while ( not T. empty () ) {
32
// Firstelementof TisT. front ()
ssd_pair_tS_k=T. front () ;
34
T. pop () ;
36
for_each_letter (iletter, alphabet) {
// X = {p | delta (p, i letter) <
Sand level (p) + k < l}
4.1 Implementations
24
38
// Y = {p | delta (p, i letter) not < Sand level (p) + k < l}
X. clear () ;
40
Y. clear () ;
for_each_state (istate, aut) {
42
delta. clear () ;
a ut. letter_deltac (delta, i state, i letter,
44
delta_kind: : states () ) ;
// w. r. tlevelandk.
46
if (levels[istate]+S_k. second<l) {
boo l b_is_in_S= true ;
48
for_all (delta_t, isucc, delta)
if (S_k. first. find (isucc) ==S_k. first. end () ) {
50
b_is_in_S= false ;
break ;
52
}
if (b_is_in_S)
54
X. insert (is tate) ;
else
56
Y. insert (is tate) ;
}}
58
60
for ( int i=r
1; i>=0; i) {
hstates_t
Qi_inter_X, Qi_minus_Qi_inter_X;
62
boo l
b_Qi_inter_Y_empty= true ;
constunsigned n_Qi_size=Qs[i]. size () ;
64
// ComputeQiinterX, Qiminus XandQiinterY
66
for_all (hstates_t, i state, Qs[i]) {
if (X. find (is tate) ! =X. end () )
68
Qi_inter_X. insert (istate) ;
else
70
Qi_minus_Qi_inter_X. insert (istate) ;
if (Y. find (is tate) ! =Y. end () )
72
b_Qi_inter_Y_empty= false ;
}
74
// ifQiinterX! = 0and QiinterY! = 0
76
if ( not Qi_inter_X. empty () and not b_Qi_inter_Y_empty) {
// Zis Qi_inter_X
78
// If | Z |<=| Qi \ Z |
if (Qi_inter_X. size () <=Qi_minus_Qi_inter_X. size () ) {
80
Qs[r]=Qi_inter_X;
Qs[i]=Qi_minus_Qi_inter_X;
82
} else {
Qs[r]=Qi_minus_Qi_inter_X;
84
Qs[i]=Qi_inter_X;
}
86
assert (Qs[i]. size () +Qs[r]. size () ==n_Qi_size) ;
T. push (ssd_pair_t (Qs[r], S_k. second+1) ) ;
88
r+=1;
}}}}}
25
Implementation in the V aucanson framework
4.2 Performances
At the moment this report is written, performance tests are not deeply made. A rstresultisthefollo wing:
Protocol:
Let Lbealanguage composed of random words over, with=f a ; b g,
Let Abeadeterministic automaton that recognizes L,
We will compute the minimal automaton for A, and a minimal cover automaton for it.
The results are the follo wing:
Cardinal
Time ( s )
A
Words Min. DF AMin. DFCAHopcroftO ( n2 ) O ( n log ( n ))
55
20
37
30
0.01
0.01
0.02
412
40
172
140
0.1
0.9
0.5
963
60
498
440
0.2
3.0
1.1
1418
80
742
698
0.5
6.4
3.1
2437
100
1481
1323
0.9
34.5
9.2
More tests and comments on performances are to be made in the next few months. Câmpeanuetal.
plan to make their tests too, but the best protocol to do so is not reallydenedat this time: real-world
applications tend to not give good results and we have to focus on specicusesofcov er automata.
Conclusion
In this technical report, we presented the theory of cover automata, anecientwayto represent nite
languages. This eld try to fulllarealneed from the world of language processing, which commonly
uses nite languages.
Cover automata are useful thanks to algorithms thatìcoverminimizeîautomata, giving an automaton
with the same language of the input one, modulo the size of the words recognized.
The best algorithm known is in O ( n log ( n )) and is inspired by Hopcroft'salgorithm, which is the most
performant algorithm for classical minimization. It is rather unlikely that this bound would be improved,
and it is assumed to be tight for both minimizations.
Ongoing researches in this eld are focused on the test of performance: scientists are interested in
knowing in which cases cover automata cansignicantly reduce the number of states of an automaton.
Bibliography
Campeanu, C., II, K. C., Salomaa, K., and Yu, S. (1999). State complexity of basic operations on nite
languages. In WIA , pages 60ñ70.
Campeanu, C., Paun, A., and Yu, S. (2002). Ane cient algorithm for constructing minimal cover
automata for nite languages. International Journal of Foundations of Computer Science (IJFCS) ,
13(1):83ñ??
Câmpeanu, C. and Paun, A. (2002). The number of similarity relations and the number of minimal
deterministic nite cover automata. In CIAA , pages 67ñ76.
Câmpeanu, C., Paun, A., and Yu, S. (2002). Ane cient algorithm for constructing minimal cover
automata for nite languages. International Journal of Foundations of Computer Science (IJFCS) ,
13(1):83ñ??
Câmpeanu, C., Sântean, N., and Yu, S. (2001). Minimal cover-automata for nite languages. Theor.
Comput. Sci. , 267(1-2):3ñ16.
Hopcroft, J. E. (1971). An n log n algorithm for minimizing the states in a nite-automaton. In Kohavi,
Z., editor, Theory of Machines and Computations , pages 189ñ196. Academic Press.
KËrner, H. (2003). A time and space ecient algorithm for minimizing cover automata for nite languages.
International Journal of Foundations of Computer Science (IJFCS) , 14(6):1071ñ??
Moore, E. (1956). Gedanken-experiments on sequential machines. In Shannon, C. and McCarthy, J.,
editors, Automata Studies , pages 129ñ153. Princeton University Press, Princeton, NJ.
Myhill, J. (1957). Finite automata and the representation of events. Technical Report 57-624, WADC.
Paun, A., Sântean, N., and Yu, S. (2001). An o ( n2 ) algorithm for constructing minimal cover automata
for nite languages. In CIAA'00: Revised Papers from the 5th International Confer ence on Implemen-
tationand Application of Automata , pages 243ñ251, London, UK. Springer -Verlag.
Sakarovitch, J. (2003). â¦lÃmentsdetheoriedesautomates . â¦ditions Vuibert. Table of Contents, preface
and introductions to chapters available athttp: //perso.enst.fr/jsaka/ETA/.
Wood, D. and Yu, S., editors (1998). Automata Implementation, Second International Workshop on
Implementing Automata, WIA'97, London, Ontario, Canada, September 18-20,1997, Revised Papers ,
volume 1436of Lecture Notes in Computer Science . Springer.
Yu (1999). State complexity of regular languages. In IWDCAGRS: Proceedings of the International
Workshop on Descriptional Complexity of Automata, Grammars and Related Structures .
List of Figures
1.1 Automataforìa*î ...................................... 6
1.2 Automaton forì(a+b)cî .................................. 7
1.3 Minimalautomaton forì(a+b)cî ............................. 8
1.4 TheautomatonM ...................................... 9
1.5 Minimalautomaton for L (M) ............................... 10
2.1 Coverautomaton for L=f a ; b ; aa ; aaa ; bab g; '=3 .................... 12
2.2 Automaton forì(a+b)aî .................................. 13
2.3 Twodistinct minimal DFCA for L=f a ; ab ; ba ; aba ; abb ; baa ; bab g ............ 14
2.4 Mergeofthesink state ................................... 15
3.1 ADFAforL (A) =f abc ; ababc ; abababc g ......................... 18
3.2 MDFCAforL (A) =f abc ; ababc ; abababc g ........................ 19
3.3 ADFAforL (B) =f bc ; babc g ................................ 20
3.4 MDFCAforL (B) =f bc ; babc g ............................... 20
Index
Alphabet, 5
Automaton, 5 , 5
completeñ, 6
coverñ, see DFCA
deterministic niteñ, see DFA
niteñ, 5
isomorphism ofñ, 7
minimalñ, see MDFA
minimal deterministicñ, see MDFA
minimal deterministiccov erñ, see MDFCA
non-deterministic niteñ, see NFA
real-timeñ, 5
Complexity, 18
Cover automata, 4
Cover minimization, 13 , 14 , 16 , 19
DFA, 6 , 13
DFCA, 12 , 13 , 14
Equivalency, 7
Evaluation, 6
Gap, 13 , 16
gap, 18
Hopcroft'salgorithm, 19
Hopcroft'salgorithm, 10
Language, 5
cover, 13
nite, 5 , 16
of an automaton, 5
Level, 14
MDFA, 6 , 7 , 8 , 13
MDFCA, 13 , 14 , 14
Minimization, 8 , 10
coverñ, see Cover minimization
Moore'salgorithm, 8
NFA, 6
Range, 16
Similarity, 13
State failure, 16
Trim, 6
Vaucanson, 21 , 23
Words, 5
29