Bottom-up parsing
From Wikipedia, the free encyclopedia
Bottom-up parsing is a strategy for analyzing unknown data relationships that attempts to identify the most fundamental units first, and then to infer higher-order structures from them. It occurs in the analysis of both natural languages and computer languages.
-
See also: top-down parsing
Within computer science, bottom-up parsing is also known as "shift-reduce parsing"
Contents |
Linguistics
In linguistics, an example of bottom-up parsing would be analyzing a sentence by identifying words first, and then using properties of the words to infer grammatical relations and phrase structures to build a parse tree of the complete sentence.
Computer Science
In programming language compilers, bottom-up parsing is a parsing method that works by identifying terminal symbols first, and combines them successively to produce nonterminals. The productions of the parser can be used to build a parse tree of a program written in human-readable source code that can be compiled to assembly language or pseudocode.
Different computer languages require different parsing techniques, although it is not uncommon to use a parsing technique that is more powerful than that actually required.
It is common for bottom-up parsers to take the form of general parsing engines, that can either parse or generate a parser for a specific programming language given a specification of its grammar. Perhaps the most well known generalized parser generator is YACC.
Type of Bottom-up Parsers
The common classes of bottom-up parsing are:
- LR parser
- LR(0) - No lookahead symbol
- SLR(1) - Simple with one lookahead symbol
- LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to implement. YACC uses this language.
- LR(1) - Most general language, but most complex to implement.
- LR(n) - where n is an integer>=0 indicates an LR parser with n lookahead symbols; while languages can be designed that require more than 1 lookahead practical languages try to avoid this because increasing n can threoretically require exponentially more code and data space (in practice, this may not be as bad).
- Precedence parser
The Parser performs one of two actions (beside accept). These are "Shift" and "Reduce".
- Shift means moving a symbol from the input to the stack
- Reduce means matching a set of symbols in the stack for a more general symbol
For example see figure 1.
An Example of shift-reduce parsing
Figure 1.
Take the language:
S --> AB
A --> a
B --> b
And the input:
ab
Then the bottom up parsing is:
Stack Input
----------+ +-+-+------
| |a|b|
----------+ +-+-+------
Shift a
--------+-+ +-+--------
|a| |b|
--------+-+ +-+--------
Reduce a (A --> a)
--------+-+ +-+--------
|A| |b|
--------+-+ +-+--------
Shift b
------+-+-+ +----------
|A|b| |
------+-+-+ +----------
Reduce b (B --> b)
------+-+-+ +----------
|A|B| |
------+-+-+ +----------
Reduce AB (S --> AB)
--------+-+ +----------
|S| |
--------+-+ +----------
Accept
External links
- [1] An example of shift-reduce parsing (which is a type of bottom up parsing), with a small grammar, state diagram, and c language code to implement the parser
- [2] course notes on shift reduce parsing
- [3] A good non-technical tutorial in the context of natural (human) languages
- [4] A discussion of shift-reduce conflicts in bottom up parsers. A knowledgeable but technical article.

