Lambda Calculus and Formal Grammar

Sep 15, 2014

PSPACE-completeness of LCFRS universal recognition

About a month ago, Marco Kuhlmann asked on twitter whether the universal recognition problem for non-permuting LCFRSs is PSPACE-complete. (Non-permuting LCFRSs were called monotone LCFRSs by Michaelis and by Kracht. They do not allow arguments of the same nonterminal on the right-hand side of a rule to appear in reversed order on the left-hand side.) Good question. Kaji et al. (1992) proved that the universal recognition for arbitrary LCFRSs is PSPACE-complete by reduction from an arbitrary language in PSPACE. The LCFRS that is the output of the reduction is not non-permuting, and converting a given LCFRS to a non-permuting normal form may result in an exponential blowup in size, so this result by itself does not answer Marco's question. I couldn't give an answer immediately because I was not familiar with the details of Kaji et al.'s proof. I found the proof not so easy to read, but, after spending a few days trying to understand it, I found that it is easy to modify their construction to make the output always non-permuting. So the answer to Marco's question is, yes, it is PSPACE-complete.

Kaji et al.'s construction is designed to simulate an accepting computation of a polynomial-space-bounded Turing machine by a derivation of an LCFRS of rank 1 and dimension (or fan-out or arity) proportional to the the number of tape cells used in the computation. Although this construction works for arbitrary polynomial-space-bounded Turing machines, the reduction is from a fixed language $L$ in PSPACE to the universal recognition problem for general LCFRSs, i.e., the reduction transforms a string $w$ to a pair $(G_w,z_w)$ consisting of an LCFRS and a string such that $w \in L$ if and only if $z_w \in L(G_w)$. So you work with a fixed Turing machine $M$ which accepts $L$ and operates in space $p(n)$, a fixed polynomial function of $n$. The important point is that the function $p(n)$ doesn't vary from input to input. (Otherwise the reduction can't be carried out in polynomial time.)

Every nonterminal $A$ of $G_w$ except the start nonterminal $S$ has the same dimension $d$, which is a constant multiple of $p(|w|)$, the space used in a possible accepting computation on input $w$. A tuple that is a possible argument of nonterminal $A$ consists of two parts—most of the components of the tuple represent the contents of $M$'s tape, and the remaining components (of which there is just one in Kaji et al.'s original construction) act as a kind of "trash can". Each cell of $M$'s tape is represented by a tuple consisting of $\epsilon$s and 1s. Although there are more efficient coding, it suffices to take a tuple of the form \[ \epsilon,\dots,\epsilon,1,\epsilon,\dots,\epsilon \] whose length $t$ equals the size of the tape alphabet of $M$. The position of the unique occurrence of $1$ tells what symbol is occupying the cell. The head position and the state of $M$ can be coded in nonterminals, so that a "derived fact" \[ A_{q,i}(\ulcorner c_1 \urcorner,\dots,\ulcorner c_{p(n)} \urcorner,\epsilon) \] represents a configuration of $M$ in which $M$ is in state $q$, its head is at the $i$th cell from the left, and the tape content is $c_1 \dots c_{p(n)}$. Here, $\ulcorner c \urcorner$ is the above tuple representation of the symbol $c$, and the final argument of $A_{q,i}$ is the trash can, which is always required to contain $\epsilon$.

A single step in $M$'s computation on $w = c_1 \dots c_n$ is accounted for by a single rule of $G_w$. Let us assume for simplicity that a single step of $M$ either just rewrites the symbol under the head (without moving) or just moves the head one cell to the left or right (without writing). (Remember that $M$ is not part of the input to reduction, so we can take any polynomial-space-bounded Turing machine we like that accepts $L$.) The first type of move corresponds to a rule of $G_w$ of the form \begin{equation} \label{rule1} A_{r,i}(\vec{x_1},\dots,\vec{x_{i-1}},\underbrace{\epsilon,\dots,\epsilon}_{l-1},x_{i,k},\underbrace{\epsilon,\dots,\epsilon}_{t-l},\vec{x_{i+1}},\dots,\vec{x_{p(n)}},x_{i,1} \dots x_{i,k-1}x_{i,k+1}x_{i,t}y) \leftarrow A_{q,i}(\vec{x_1},\dots,\vec{x_{p(n)}},y), \end{equation} and the second type of move corresponds to a rule of the form \begin{equation} \label{rule2} A_{r,i \pm 1}(\vec{x_1},\dots,\vec{x_{p(n)}},y) \leftarrow A_{q,i}(\vec{x_1},\dots,\vec{x_{p(n)}},y). \qquad\qquad\mbox{} \end{equation} Here, $\vec{x_j}$ stands for a tuple of variables $x_{j,1},\dots,x_{j,t}$ of length $t$. The change from \[ x_{i,1},\dots,x_{i,t} \] to \[ \underbrace{\epsilon,\dots,\epsilon}_{l-1},x_{i,k},\underbrace{\epsilon,\dots,\epsilon}_{t-l} \] in (\ref{rule1}) expresses the rewriting of the symbol represented by \[ \underbrace{\epsilon,\dots,\epsilon}_{k-1},1,\underbrace{\epsilon,\dots,\epsilon}_{t-k} \] with another symbol; it moves the $k$th component of the tuple to the $l$th component, while moving all the other components to the trash can.

The initial configuration of $M$ corresponds to the unique terminating rule \begin{equation} \label{rule3} A_{q_I,1}(\ulcorner c_1 \urcorner,\dots,\ulcorner c_n \urcorner,\ulcorner b \urcorner,\dots,\ulcorner b \urcorner, \epsilon) \leftarrow , \end{equation} where $q_I$ is the initial state of $M$ and $b$ stands for the blank symbol. For each final state $q$ and $i=1,\dots,p(n)$, there is the rule \begin{equation} \label{rule4} S(x_{1,1}\dots x_{1,t} \dots x_{p(n),1} \dots x_{p(n),t} \# y) \leftarrow A_{q,i}(\vec{x_1},\dots,\vec{x_{p(n)}},y), \;\mbox{} \end{equation} which concatenates all the tuples representing the tape and affixes $\#$ followed by the content of the trash can. Setting $z_w = 1^{p(n)} \# $ enforces the requirement that the string in the trash can be always empty.

Now rules of $G_w$ of type (\ref{rule1}) are not non-permuting, because variables that get moved to the trash can may move over other variables. To avoid this, an easy solution is to place trash cans to the left and to the right of every tuple representing a tape cell: \begin{equation} \tag{$1'$} A_{r,i}(y_0,\vec{x_1},y_1,\dots,\vec{x_{i-1}},y_{i-1} x_{i,1} \dots x_{i,k-1},\underbrace{\epsilon,\dots,\epsilon}_{l-1},x_{i,k},\underbrace{\epsilon,\dots,\epsilon}_{t-l},x_{i,k+1} \dots x_{i,t} y_i,\vec{x_{i+1}},y_{i+1},\dots,\vec{x_{p(n)}},y_{p(n)}) \leftarrow A_{q,i}(y_0,\vec{x_1},y_1,\dots,\vec{x_{p(n)}},y_{p(n)}). \end{equation} The dimension $d$ of nonterminals (other than $S$) has changed, so we need to adjust rules (\ref{rule2}) and (\ref{rule3}) accordingly: \begin{gather} \tag{$2'$} A_{r,i \pm 1}(y_0,\vec{x_1},y_1,\dots,\vec{x_{p(n)}},y_{p(n)}) \leftarrow A_{q,i}(y_0,\vec{x_1},y_1,\dots,\vec{x_{p(n)}},y_{p(n)}), \\ \tag{$3'$} A_{q_I,1}(\epsilon,\ulcorner c_1 \urcorner,\epsilon,\dots,\ulcorner c_n \urcorner,\epsilon,\ulcorner b \urcorner,\epsilon,\dots,\ulcorner b \urcorner, \epsilon) \leftarrow . \end{gather} To force each trash can to be empty, we now let $z_w = (\# 1 \#)^{p(n)}$ and replace (\ref{rule4}) by \begin{equation} \tag{$4'$} S(y_0 \# x_{1,1} \dots x_{1,t} \# y_1 \# x_{2,1} \dots x_{2,t} \# y_2 \dots x_{p(n),1} \dots x_{p(n),t} \# y_{p(n)}) \leftarrow A_{q,i}(y_0,\vec{x_1},y_1,\dots,\vec{x_{p(n)}},y_{p(n)}). \end{equation}

With these changes, the grammar $G_w$ is now non-permuting, and since it is of rank 1, it is also well-nested. So this shows that universal recognition for well-nested MCFGs (LCFRSs) is also PSPACE-complete, something I have always claimed without realizing that Kaji et al.'s proof needs to be modified to establish it.

Dec 3, 2013

Machine-Based Approach to Pumping

(January 1, 2014: The statement of "Harrison's theorem" in the original post was in error. The error is now corrected. See here.)

(December 4, 2013: Scroll to the end of the post for an update.)

When I was a first-year Ph.D. student at Stanford, I took an automata theory course offered by the computer science department, taught by the late Robert Floyd. The course used a draft version of a textbook he and Richard Beigel were writing, later published as The Language of Machines in 1994. The book, presumably influenced by an old paper by Dana Scott, adopted a rather unconventional approach to automata theory. A clear separation between machines and programs and abstract notions of machines were two of such unconventional features. I quite liked the book, but it would probably be good to have another textbook with more standard notations and terminology, such as Michael Sipser's.

Anyway, one of the interesting things about the draft of Floyd and Beigel's book was that they gave a proof of the pumping lemma for context-free languages using pushdown automata, rather than context-free grammars. I had already seen the standard, grammar-based proof, but I was fascinated by the PDA-based proof in the book. In the grammar-based proof, you look at a long path in a derivation tree that has two occurrences of the same nonterminal on it. In the machine-based proof, you look at a computation in which the stack exceeds a certain height. With the definition of the PDA adopted in the book, where you can push a symbol on the stack regardless of the top symbol of the stack, the height to be exceeded is simply $|Q|^2$, where $Q$ is the state set of the PDA.

A tricky thing about this proof is that it won't do to just take a very long string, because a PDA can run indefinitely long without the stack ever exceeding a given height. But this is OK because the set of strings that a PDA accepts without the stack height ever exceeding $h$ is regular, no matter what $h$ is. In the margin next to the proof in Floyd and Beigel's draft, there was a note that said, "A hard exercise for a machine-based proof." Alas, the proof was replaced by the standard, grammar-based proof in the published version of the book.

I was always fond of this machine-based proof, and I even mentioned it when I presented my paper on the pumping lemma for well-nested multiple context-free languages (PDF), first in Haifa in 2008, and then in Stuttgart in 2009. Unfortunately, I didn't keep the draft version of the book, but I'm fairly confident my reconstruction of the proof is the same as the one Floyd and Beigel had in the draft.

Recently, I had a chance to think about this proof again, when I was writing my just-finished paper for the 19th Amsterdam Colloquium. The proof of one direction of the main result of this paper uses a pumping lemma for deterministic context-free languages from Michael Harrison's book (Theorems 11.8.2 and 11.8.3). (It seems the result first appeared in Havel and Harrison's 1974 paper.)

Harrison proves his pumping lemma by analyzing derivation trees of strict deterministic grammars. I confess to not having read his proof, but it seems to me that it's much easier to prove it using deterministic PDA. In the case of deterministic PDA, you can't assume the stack is emptied before acceptance, so the bound to be exceeded in the proof is not about the stack height, but the decrease in stack height. Also, if you work with a model of PDA where the stack action is always sensitive to the top symbol of the stack, the bound becomes $(|Q|\cdot|\Gamma|)^2$, rather than $|Q|^2$, where $\Gamma$ is the stack alphabet. Other than that, the idea is essentially the same, and you can get a strong pumping property of the DCFLs easily.

I actually wrote up this machine-based proof of Harrison's pumping lemma for my paper, but I had to cut it because the paper otherwise didn't fit within the 8-page limit.

The thing is I've never seen the machine-based proof of the pumping lemma for CFL or DCFL anywhere else, and I don't know if it ever appeared in print. I noticed that Havel and Harrison refer to this 1969 paper by William Ogden. This paper is mostly about stack languages, and the author doesn't say much about PDA in it (no explicit bound to be exceeded is mentioned), but his 1968 dissertation is titled "Intercalation theorems for pushdown store and stack languages", so it probably has a more detailed treatment there. I can't find Ogden's dissertation anywhere on the web, but I hope to obtain a copy at some point. (Incidentally, Ogden's Ph.D. advisor was Dana Scott, and I think Robert Floyd became a professor at Stanford in 1969, so there's the connection.)

(Note: The proof of the pumping lemma for linear indexed languages uses a similar argument. See this paper by Shamir or here (Lecture 4).)

You can stop reading now (if you haven't already), but if you're interested, here's my write-up of the machine-based proof of (a corollary of) Harrison's theorem. (Thanks to MathJax, it's now fairly easy to copy-and-paste from a LaTeX source.)

(January 1, 2014: The statement of the following theorem in the original post was in error.)

Theorem (Harrison). Let $L \subseteq \Sigma^{\ast}$ be a DCFL. There exists a regular set $R \subseteq L$ satisfying the following property: for every $w \in L-R$, there exist $x_1,x_2,x_3,x_4,x_5$ such that

$w = x_1 x_2 x_3 x_4 x_5$;
$x_2 x_4 \neq \epsilon$;
for every $z \in \Sigma^{\ast}$ and $n \in \mathbb{N}$, $x_1 x_2 x_3 x_4 z \in L$ if and only if $x_1 x_2^n x_3 x_4^n z \in L$.

In the following proof, we write $(q,\gamma) \stackrel{a}{\Longrightarrow}_1 (q',\gamma')$ for $(a,q,\gamma) \vdash_M (\epsilon,q',\gamma')$, and $(q,\gamma) \stackrel{w}{\Longrightarrow} (q',\gamma')$ for $(w,q,\gamma) \vdash_M^{\ast} (\epsilon,q',\gamma')$. Note that $(q,\gamma) \stackrel{w}{\Longrightarrow} (q',\gamma') \stackrel{w'}{\Longrightarrow} (q'',\gamma'')$ implies $(q,\gamma) \stackrel{ww'}{\Longrightarrow} (q'',\gamma'')$.

Let $M = (Q,\Sigma,\Gamma,\delta,q_0,Z_0,F)$ be a deterministic PDA that recognizes $L$. We may suppose without loss of generality that that the computation of $M$ never blocks or enters into an infinite loop, and $(p,\gamma) \in \delta(q,a,Y)$ implies $\gamma \in \{ \epsilon \} \cup \Gamma \cup \Gamma Y$. Let $h = (|Q| \cdot |\Gamma|)^2$. We divide $L$ into two sets $L_0, L_1$, where $L_0$ is a regular set, as follows.
\begin{gather*} L_1 = \{\, w \in L \mid w = w_1 w_2 w_3, (q_0,Z_0) \stackrel{w_1}{\Longrightarrow} (q,\alpha\beta) \stackrel{w_2}{\Longrightarrow} (r,\beta), |\alpha| = h, |\beta| \geq 1 \,\},\\ L_0 = L - L_1. \end{gather*} The reason that $L_0$ is regular is that if the stack height never shrinks by more than $h$ symbols, then all but the top $h$ symbols on the stack can be discarded, and the resulting bounded-height stack can be simulated by the finite control.

Suppose $w \in L_1$. Let $m$ be the number of steps that $M$ takes until it accepts $w$. Let $\gamma_0 = Z_0$, the initial stack content, and for each $i=1,\dots,m$, let $q_i$ and $\gamma_i$ be the state and the stack content of $M$ after the $i$th step. We can write $w = a_1 \dots a_m$, with $a_i \in \Sigma \cup \{ \epsilon \}$, so that
\[ (q_{i-1},\gamma_{i-1}) \stackrel{a_i}{\Longrightarrow}_1 (q_i,\gamma_i). \] Since $w \in L_1$, we must have $\gamma_k = \alpha \gamma_l$ and $\gamma_l \neq \epsilon$ for some $k < l \leq m$ and $\alpha \in \Gamma^h$. We may suppose $l$ to be the least number with this property. This implies $|\gamma_i| \leq |\gamma_k|$ for $i=0,\dots,l$. Now for each $i \in [0,h]$, we mark the last moment up to $k$ as well as the first moment beyond $k$ when the stack has height $|\gamma_l|+i$. More precisely, for each $i \in [0,h]$, let $f(i)$ be the greatest number in $[0,k]$ such that $|\gamma_{f(i)}| = |\gamma_l|+i$, and $g(i)$ be the least number in $[k,l]$ such that $|\gamma_{g(i)}| = |\gamma_l|+i$. Because of the assumption about $\delta$, the stack height changes by at most 1 at each step, and $f(i)$ and $g(i)$ are always defined. Because of the way $f$ and $g$ are defined, $f(i) < j < g(i)$ implies $|\gamma_j| > |\gamma_{f(i)}| = |\gamma_{g(i)}| = |\gamma_l| + i$. (Update April 14, 2016: Thanks to Johannes Osterholzer for spotting an error here in the original post, which is now corrected.) For $i \in [0,l]$, let $Z_i$ be the top symbol of $\gamma_i$. Since $h = (|Q| \cdot |\Gamma|)^2$, there must be $i,j \in [0,h]$ such that $i < j$ and \[ (q_{f(i)},Z_{f(i)}) = (q_{f(j)},Z_{f(j)}), \quad (q_{g(j)},Z_{g(j)}) = (q_{g(i)},Z_{g(i)}). \] This implies \[ \gamma_{f(i)} = \gamma_{g(i)},\quad \gamma_{f(j)} = \gamma_{g(j)} = \delta \gamma_{f(i)} \] for some $\delta \in \Gamma^+$. Let \[ x_1 = a_1 \dots a_{f(i)},\quad x_2 = a_{f(i)+1} \dots a_{f(j)},\quad x_3 = a_{f(j)+1} \dots a_{g(j)},\quad x_4 = a_{g(j)+1} \dots a_{g(i)}. \] Then we have \[ (q_{f(i)},\gamma_{f(i)}) \stackrel{x_2^n}{\Longrightarrow} (q_{f(i)},\delta^n\gamma_{f(i)}) \stackrel{x_3}{\Longrightarrow} (q_{g(j)},\delta^n\gamma_{f(i)}) \stackrel{x_4^n}{\Longrightarrow} (q_{g(j)},\gamma_{f(i)}) \] for all $n \geq 0$. Since $M$ never enters into an infinite loop, we must have $x_2 \neq \epsilon$. Let $w'$ be such that $w = a_1 \dots a_l w'$ and let $x_5 = a_{g(i)+1} \dots a_l w'$. It is now easy to verify the conditions 1–3 of the theorem.

Update (December 4, 2013). It has occurred to me that the difficulty with the machine-based approach lies in the proof of Ogden's lemma (what Harrison calls the "iteration theorem"). What I called "Harrison's theorem" above is really a weakened form corresponding to the usual pumping lemma. This paper by King says "Ogden [in his Ph.D. thesis] gave an iteration theorem for the family of deterministic context-free languages" and Harrison and Havel gave "a new proof of Ogden’s result for […] deterministic context-free languages." But this 1985 technical report by Harrison says:

"The key to giving a rigorous proof for deterministic iteration theorems is to use trees, not machines. This was a problem in [Ogden's Ph.D. thesis] where a deterministic iteration theorem was first expressed but not proved rigorously. It turns out to be easy to do with strict deterministic grammars." (page 9)

Problem 5 in Section 6.2 of Harrison's book asks for "a pda proof of the iteration theorem". Intriguingly, the next problem (Problem 6) is posed as a question "Using the result of Problem 5, can you find an iteration theorem for deterministic context-free languages?"

Correction (January 1, 2014). The statement of "Harrison's theorem" in the original post was in error and read as follows:

Theorem (Harrison). Let $L \subseteq \Sigma^{\ast}$ be a DCFL. There exists a positive integer $p$ satisfying the following property: for every $w \in L$ with $|w| \geq p$, there exist $x_1,x_2,x_3,x_4,x_5$ such that

$w = x_1 x_2 x_3 x_4 x_5$;
$x_2 x_4 \neq \epsilon$;
for every $z \in \Sigma^{\ast}$ and $n \in \mathbb{N}$, $x_1 x_2 x_3 x_4 z \in L$ if and only if $x_1 x_2^n x_3 x_4^n z \in L$.

This is not even true; $L = \{\, w \in \{ a,b \}^{\ast} \mid \#_a(w) \geq \#_b(w) \,\}$ and $w = a^p$ constitute a counterexample. It seems I somehow managed to mix up the two versions of Harrison's iteration theorems that apply to two different classes of languages. The main result of my Amsterdam Colloquium paper is not affected, since it suffices to be able to find one $w \in L$ that satisfies the conditions of the theorem.

Dec 20, 2010

Incompleteness via Tarski's Theorem

Many textbooks, including the one I'm using, present an "easy" proof of Gödel's First Incompleteness Theorem via Tarski's Theorem on undefinability of truth, before presenting a proof more in line with Gödel's original proof. Tarski's Theorem (or a version thereof) states that the set TR of Gödel numbers of sentences of first-order arithmetic that are true in the standard model is not "arithmetical", i.e., is not definable by any formula of first-order arithmetic. In contrast, the set PV of Gödel numbers of sentences provable in any recursively axiomatized theory is arithmetical and has a Σ₁ defining formula. Therefore, this proof goes, the two sets can't be equal, and there must exist a true sentence that is not provable, unless the theory proves some false sentence.

In my logic class, I thought of mentioning the proof via Tarski's Thorem before going into any technicalities about arithmetization. I was going to actually prove Tarski's Theorem and only state that PV is arithmetical, leaving the proof of the latter fact for the next few lectures.

But then I realized the problem with this approach — Tarski's Theorem is not that easy to prove. The proof given in Keisler et al.'s textbook uses (a weak version of) the Fixed Point Theorem, which depends on the fact that the "diagonal function" is arithmetical, which in turn relies on the fact that all computable functions are arithmetical, and ultimately rests on the existence of the Gödel beta function. In other words, this proof involves almost all machinery used in Gödel's original proof.

It is possible to prove the arithmetical definability (or representability) of the diagonal function (or some such devices needed to prove Tarski's Theorem) without using the beta function, as is done in Boolos and Jeffrey's book (at least until the third edition), but this depends on the details of the Gödel numbering of formulas, and does not work with the method used in Keisler et al.'s book.

So, rather than prove Tarski's Theorem, what I did in my class was to prove the arithmetical undefinability of the "satisfaction relation"

SAT = { (#(A(x)), n) | A(n) is true },

which holds of all pairs whose first component is the Gödel number of a formula and whose second component is a number that the formula is true of in the standard model. (Here, n is the numeral representing the natural number n.) This has an easy three-line proof, does not require the diagonal function on natural numbers, and, as noted here, makes no assumption on the nature of the Gödel numbering (except injectivity) or on the vocabulary from which formulas are built.

To prove the incompleteness using the undefinability of SAT, all you need is to prove the arithmetical definability of the "provable satisfaction" relation

PV₂ = { (#(A(x)), n) | A(n) is provable },

which, of course, hardly requires any more work than showing the arithmetical definability of the set PV of Gödel numbers of provable sentences. If no false sentence is provable, then PV₂ must be a proper subset of SAT, and it follows that there is a true sentence that is not provable.

This is of course a rather trivial point, but I find it a little curious that the undefinability of SAT does not seem to be mentioned very often, not even in a book like Smullyan's Gödel's Incompleteness Theorems.

May 22, 2010

Labeled Bracketings and the Chomsky-Schützenberger Theorem

I always thought that the Chomsky-Schützenberger Theorem is an easy consequence of the fact that for every CFG G, the strings generated by G are precisely the yields of the parse trees of G. Until recently, I did not have a good understanding of just how easy a consequence it is.

The theorem is usually stated as follows:

Theorem (Chomsky-Schützenberger). For every context-free language L, there exist a positive integer n, a homomorphism h, and a regular set R such that

L = h(D_n ∩ R),

where D_n is the Dyck language over the set Δ_n of n pairs of parentheses.

In fact, the theorem can be strengthened to require that the set R be local in the sense that there are sets J₁, J₂ ⊆ Δ_n and I ⊆ Δ_n² such that

R = (J₁ Δ_n^* ∩ Δ_n^* J₂) − Δ_n^* (Δ_n² − I) Δ_n^*.

This strengthening is often not mentioned in textbooks; it is explicitly stated in Chomsky's (1963) "Formal properties of grammars". Another fact mentioned by Chomsky (1963) is that D_n may be replaced with the set D′_n of Dyck primes, which are elements of D_n whose first and last symbols form a matching pair of parentheses. (Also, the homomorphism h is in fact alphabetic in the sense that it maps each symbol in Δ_n either to a symbol or to the empty string, but this is always clear from the proof.)

I had thought that the set D_n ∩ R in the statement of the theorem can just be taken to be an alternative representation of the set PT(G) of parse trees of any CFG G for L. This is essentially correct, but not literally true. Chomsky expressed the qualification as restrictions on the rules of G (see below).

Proofs of the theorem given in some textbooks (e.g., Salomaa 1972 and Berstel 1979) fail to make any connection between D_n ∩ R and PT(G), thus bypassing a main concern for Chomsky. The proof in Kozen's (1997) Automata and Computability—perhaps the clearest textbook treatment of the theorem—makes a clear connection between the two sets without explicitly mentioning it. (Another easy line of approach to the theorem, via pushdown automata (e.g., Harrison 1978) only makes an indirect connection between the two sets, through PDA computations.)

Here's how I came to view the connection. Recall that a set L of trees over some finite alphabet Γ of labels is local if there are a set J ⊆ Γ and a finite set I ⊆ Γ⁺ such that any tree T is in L if and only if it satisfies the following conditions:

the label of the root is in J, and
for every node, the string obtained by concatenating its label with the labels of its children is in I.

As is well-known, the set PT(G) is local for any CFG G.

Let τ be the obvious translation that maps a tree to its labeled bracketing representation—that is to say, if T is a tree with root label σ and immediate subtrees T₁, …, T_n, then τ(T) = [_σ τ(T₁) … τ(T_n) ]_σ. If T is a tree over Γ with |Γ| = n, then τ(T) is an element of D′_n, where each pair of parentheses in Δ_n corresponds to a symbol in Γ. Let us call a set L of trees over Γ super-local if τ(L) = D′_n ∩ R for some local set R ⊆ Δ_n^*. It is clear that a set L of trees is super-local if there are sets J₁, J₂ ⊆ Γ, and I₁, I₂, I₃ ⊆ Γ² such that for every tree T, T is in L if and only if it satisfies the following conditions:

the label of the root is in J₁,
the label of every leaf is in J₂,
if a node labeled by Y is the first child of a node labeled by X, then XY ∈ I₁,
if a node labeled by Y is the last child of a node labeled by X, then YX ∈ I₂, and
if a node labeled by X and a node labeled by Y are adjacent siblings (in this order), then XY ∈ I₃.

Clearly, any super-local set of trees of bounded out-degree must be local, but it is easy to see that there are local sets, including sets of the form PT(G), that are not super-local. The set of "rule trees" of a CFG, where a rule, rather than a nonterminal, labels each node, need not be super-local either.

The strengthened form of the Chomsky-Schützenberger Theorem is an immediate consequence of the following easy lemma:

Lemma. If L is a local set of trees over Γ, then there exist a super-local set L′ over Γ′ and a relabeling g: Γ′ → Γ such that g maps L′ onto L.

The idea is to add to the label of each node information about the labels of all its siblings. This construction can be seen to underly Kozen's (1997) proof of the Chomsky-Schützenberger Theorem. While a "rule tree" of a CFG labels each node by the rule expanding it, the tree used by Kozen puts information about the rule on the children of the node expanded by the rule. It is in fact sufficient for each child to be labeled by the right-hand side of the rule expanding its parent (plus its position among its siblings).

Proof. Let J ⊆ Γ, I ⊆ Γ⁺ be the finite sets witnessing the locality of L. Let

Γ′ = { [X]₁ | X ∈ J } ∪ { [α]_i | X ∈ Γ, Xα ∈ I, 1 ≤ i ≤ |α| },

and set

J₁ = { [X]₁ | X ∈ J },
J₂ = { [X₁ … X_l]_i | X_i ∈ I },
I₁ = { [X₁ … X_l]_i [Y₁ … Y_m]₁ | X_i Y₁ … Y_m ∈ I, m ≥ 1 },
I₂ = { [Y₁ … Y_m]_m [X₁ … X_l]_i | X_i Y₁ … Y_m ∈ I, m ≥ 1 },
I₃ = { [X₁ … X_l]_i [X₁ … X_l]_i+1 | 1 ≤ i < l}.

Let L′ be the super-local set determined by J₁, J₂, I₁, I₂, I₃, and let

g([X₁ … X_l]_i) = X_i.

Now it is clear that g gives a one-one correspondence between the trees in L′ and the trees in L.

Note that since the conditions given by I₁, I₂, I₃ in the above proof are not independent, the definition of either I₁ or I₂ (but not both) may be changed to drop the part that says X_iY₁ … Y_m ∈ I, without thereby affecting the resulting super-local set L′. If we wish to have τ(L′) = D_n ∩ R, rather than τ(L′) = D′_n ∩ R, that's possible too by changing the label of the root of every tree to a special symbol that only appears at the root.

In the above proof, each node label in a tree from L is changed to the sequence of original labels of all its siblings, including itself, plus its position among its siblings. Clearly, there are more economical ways of adding information to each node label to create a super-local set. For instance, we can use labels of the form X_i … X_m consisting of the node's original label followed by the original labels of all its right siblings.

Let us call a tree language super-regular if it can be expressed in the form τ^-1(D′_m ∩ R) with R regular. Every super-regular tree language is regular, since it can be recognized by a "left-to-right" one-visit tree-walking automaton (cf. Okhotin, Salomaa, and Domaratzki 2002). The converse is of course not true, and the super-regular tree languages don't even contain all the local tree languages. Here is an example of a Chomsky normal form CFG G such that PT(G) is not super-regular. This shows that the set D_n ∩ R in the Chomsky-Schützenberger Theorem, even in its weak form, cannot in general be taken to be τ(PT(G)), unless the grammar G satisfies some special conditions.

S → S A
S → B S
S → a
A → a
B → a

That the set of parse trees of this grammar is not super-regular can be seen as follows. Consider strings of the form

(*) [_S x₁ … [_S x_k [_S [_a ]_a ]_S y_k ]_S … y₁ ]_S, with x_i ∈ { [_A [_a ]_a ]_A, ε } and y_i ∈ { [_B [_a ]_a ]_B, ε } (i = 1, … k).

Any string of the form (*) is in τ(PT(G)) if and only if for all i = 1, … k, either

x_i = [_A [_a ]_a ]_A and y_i = ε

x_i = ε and y_i = [_B [_a ]_a ]_B.

This means that the 2^k strings of the form

(**) [_S x₁ … [_S x_k, with x_i ∈ { [_A [_a ]_a ]_A, ε } (i = 1, … k)

are such that no two of them stand in the Myhill-Nerode right-congruence relation ≡_τ(PT(G)) associated with τ(PT(G)).

Now let D′₄ be the set of Dyck primes over { [_S,]_S,[_A,]_A,[_B,]_B,[_a,]_a }, and R be any regular set of strings. Clearly, all strings of the form (**) stand in the relation ≡_D′₄ to each other. Since R is regular, ≡_R has only finitely many equivalence classes. So if we choose large enough k, there are distinct strings w₁, w₂ of the form (**) that stand in the relation ≡_R. Since for any string languages L₁ and L₂, ≡_{L₁ ∩ L₂} is a coarser equivalence relation than ≡_L₁ ∩ ≡_L₂, the strings w₁, w₂ must stand in the relation ≡_{D′₄ ∩ R}. Therefore, we cannot have D′₄ ∩ R = τ(PT(G)) for any regular set R. Since τ(PT(G)) = D′₄ ∩ R is implied by τ(PT(G)) = D₄ ∩ R, we cannot have the latter, either, for any regular R.

Update January 18, 2012: A similar proof appears in Libkin 2006.

When is PT(G) super-regular? Chomsky's (1963) proposed sufficient condition was that G be a modified normal grammar in the sense that G is in Chomsky normal form and contains no pair of rules

A → B C
D → C E

for any nonterminals A, B, C, D, E. However, Chomsky's proof that (in our words) PT(G) is super-regular for any modified normal grammar G does not work. Adding a further condition that says that if

A → B C
A → D E
F → B E

are rules of the grammar, then so is

A → B E

guarantees that PT(G) is super-local. I have so far been unable to tell whether there is any modified normal grammar G for which PT(G) is not super-regular.

Mar 26, 2010

Mathematical Logic and Computability

I'll be teaching logic to graduate students in philosophy this coming semester. The textbook I'll use is H. Jerome Keisler et al.'s (1996) Mathematical Logic and Computability. Despite the fact that it has many typos, I like this book. It uses tableaus for proofs and register machines for computation. It contains a detailed treatment of Gödel's Incompleteness Theorems. The book's level is definitely more adequate for my purposes than Boolos and Jeffrey's Computability and Logic. I'll see how much of the book I can cover in two semesters.

The book is out of print, but I found a PDF version at the following URL:

http://www.cs.mum.edu/courses/corazza/LogicAndComputability.pdf

This version is different from the older version that shows up in Google Scholar. It's not exactly the same as the printed book, but it's very close. (The title page of the PDF version lists Paul Corazza, H. Jerome Keisler, Kenneth Kunen, Terrence Millar, Arnold Miller, and Joel Robbin as the authors.)

The review by Leon Harkleroad states that the definition of a recursively enumerable set in an exercise to Chapter 5 is incorrect. I see nothing wrong with the definition, which reads:

A subset A ⊂ N is recursively enumerable (or r.e.) if A = ∅ or A is the range of a total computable function

The book comes with a software package designed for DOS and Windows. In the words of the authors, "The programs are copyrighted by the authors of this book and are part of the public domain." I can't find the software anywhere on the web, but fortunately I still keep the diskette that came with my copy of the book. I was able to run it on my Mac using CrossOver Mac. With WineBottler I had less success.

Mar 2, 2010

Kit Fine's situation semantics

Kit Fine gave an invited talk at the Third Interdisciplinary Ontology Conference on Sunday, titled "A State-Based Approach to the Frame Problem". He gives a kind of situation semantics to the classical propositional language, and then extends it to a hybrid modal language for reasoning about change.

A model for the propositional language is an ordered triple (S,≤,[ ]), where (S,≤) is a partially ordered set of states (his term for partial situations), subject to certain conditions, and [ ] is a valuation function assigning each propositional variable a pair ([p]⁺,[p]^-) of sets of verifying and falsifying states, subject to certain natural conditions. The recursive clauses for ∧ and ∨ are interesting:

s verifies B ∧ C if for some t and u, t verifies B, u verifies C, and s = t+u (the l.u.b of {t,u}).
s falsifies B ∧ C if s falsifies B or s falsifies C or for some t and u, t falsifies B, u falsifies C, and s = t+u.

s verifies B ∨ C if s verifies B or s verifies C or for some t and u, t verifies B, u verifies C, and s = t+u.
s falsifies B ∨ C if for some t and u, t falsifies B, u falsifies C, and s = t+u.

Let (S,≤) be the upper semi-lattice generated by three states s₁, s₂, s₃. Then (S,≤) is a state space according to Fine's definition. Let p₁, p₂, p₃ be propositional variables, and for each i = 1,2,3, let

[p_i]⁺ = { s_i }, [p_i]^- = ∅.

Then (S,≤,[ ]) is a model. The formula (p₁ ∧ p₂) ∨ p₃ is verified by the states s₁+s₂, s₃, and s₁+s₂+s₃, but not by any other states. This means that the recursive clauses do not preserve the following condition on the sets of verifying and falsifying states of a formula:

s verifies A, u verifies A, and s ≤ t ≤ u imply t verifies A.
s falsifies A, u falsifies A, and s ≤ t ≤ u imply t falsifies A.

Subscribe to: Posts ( Atom )