C++ Parsing suggestions please!

FaRKle0079

2[H]4U
Joined
Jan 11, 2005
Messages
2,686
Hey all,

So I'm having a difficult time conceptually figuring out how I'm going to be parsing input commands for one of my C++ class' assignments.

The input commands can have some of these following forms (# = a number)::
(command # # #)(command #)(command (command (command #) # #)( command #) #)

The program keeps reading till EOF, so input can either be from a file or from a user typing on his or her keyboard. There are a bunch of different commands, and each command has a specified amount of numbers following it. All commands are enclosed in ( ), but there can be any amount of whitespace or the character '0' between commands.

The first way I thought of doing this was just passing everything into a string via cin since it would automatically get rid of the whitespace. That way would force me to make multiple cases however in case there were 0's between the commands and it requires me to make multiple cases for the last # in each command since it may or may not be right up against a ) to end the command. I'd go from reading all the contents of each cin to reading it character by character, storing the indexes of the beginning and end of what I want, and then putting that into a substring.

The commands furthest inside the ( ) will be executed first and then it'll work its way out to the outer commands. The way this is set up has me thinking of trying to store the commands and their various parameters in a binary tree. Because of the inside to out structure it also seems like recursion would work well for executing the commands (still researching on how to do this though).

Would you guys have any methods you suggest I look into on how to parse/input these commands? Using cin will probably work, but I'm wondering if there are any tips I haven't learned yet.
 
Last edited:
Step 1 is to tokenize the string - split it up into its components. In your case, '(', ')', each of the command strings, and numeric literals would be tokens.

Once you have the tokens, parsing them into a grammar is a clever little trick. If you're allowed to use 3rd-party tools to generate your parser, I highly recommend learning lex/flex and yacc/bison - very useful tools. If you have to write the parser yourself, that's a bit more complicated, but as Khanmots pointed out, recursion is your friend.
 
Ok, don't try to do this any way but recursion. you can do it via the 3rd party tools but a basic outline is below. The tokenization is not really covered but is pretty critical. you need to know what the current language element is and process it before moving on.


parsecmd: validate "("; getcmd; validate ")";

getcmd: if "(" then parsecmd else docmd;

docmd: validate command;
-> while not "(" then getarg
-> execute cmd with args
-> return result

getarg:
-> if "(" then return parsecmd
-> if # then build + return arg
 
I wouldn't worry about explicit tokenizing; Simply parse left to right removing unneeded padding as you go (may require a small buffer to pseudo-peek) and then making your recursive call when you hit a '(' and returning the result of the command when you find a ')' and looping on that initial recursive call until EoF. Seems like an interesting problem.
 
If not recursion (the professor may intentionally try to overflow your stack), create and use a stack.
 
Step 1 is to tokenize the string - split it up into its components. In your case, '(', ')', each of the command strings, and numeric literals would be tokens.

I completely forgot about that! I used strtok on a previous assignment, but didn't get it working correctly. Sounds like a good idea for this assignment though.

If you're allowed to use 3rd-party tools to generate your parser, I highly recommend learning lex/flex and yacc/bison - very useful tools. If you have to write the parser yourself, that's a bit more complicated, but as Khanmots pointed out, recursion is your friend.

Hrm I'll see if my schools computers have those installed (I SSH in for all my programming). The professor said we could use whatever tools we wanted, as long as it compiled on the department machines and ran.

If not recursion (the professor may intentionally try to overflow your stack), create and use a stack.

My professor isn't going to do this :) I've considered using a stack, but am reading up on it to understand it better.
 
If not recursion (the professor may intentionally try to overflow your stack), create and use a stack.

Unless he's having to test on something really strange he's probably fine. His worst-case recursive stack use is going to be linearly based on the length of the input, and I really doubt that the prof is going to shove a huge (hundreds of megs) worst-case test file at it.

Now, if he was using recursion to create a tree for later traversal of all possible permutations of a set of items... then things would be different. :D
 
Trim, and Replace functions are your friend.

You could run an algorithm that searches for the '(' character, and puts everything between that char and ')' into a string, and then run a recursive on that string incase there are other commands inside.

void getCommand(String commandlist) {
int start = end = -1;
int x=0;
while(x<strlen(commandlist) || (start != -1 && end != -1)) {
if(commandlist[x] == '(')
start=x;
if(commandlist[x] == ')')
end=x;
x++
}
--> Then strip '(' and ')' and run command inside, then pull out the command you just ran from the commandlist variable (create a temp, copy from 0 to (start - 1), and append (end +1) to strlen(commandlist), and run the commandlist variable through the same function again.
 
Hrm I'll see if my schools computers have those installed (I SSH in for all my programming). The professor said we could use whatever tools we wanted, as long as it compiled on the department machines and ran.

I'd double check if lex & yacc (or other similar) tools are allowed.

The overview to how those work is that you write regex to match tokens, and then grammar rules for how the tokens go together. Then it takes that specification and machine generates C (or other lang) code. Depending on the tool you'd embed your own code in the grammar rules to do 'stuff' or then parse the resulting tree at the end.

Your assignment seems more along the lines of writing the parser yourself than using a tool to do that part for you.
 
Unless he's having to test on something really strange he's probably fine. His worst-case recursive stack use is going to be linearly based on the length of the input, and I really doubt that the prof is going to shove a huge (hundreds of megs) worst-case test file at it.

Now, if he was using recursion to create a tree for later traversal of all possible permutations of a set of items... then things would be different. :D

None of you had my professors, then. Once you have a prof that positively delights in "what, it broke? but I tested it so thoroughly!" your outlook changes a bit ;).
 
None of you had my professors, then. Once you have a prof that positively delights in "what, it broke? but I tested it so thoroughly!" your outlook changes a bit ;).

You think that's bad? Mine compiled it into assembly code, and then ran it so he could watch every transaction.
 
um, actually I think you could optionally number some punch cards, and they would be reordered. but my memory might be wrong. i think that is where early basic programs picked up the numbering they did
 
I was being serious. :(

What you're saying doesn't make any sense, and I'm shocked you believe a prof that told you that. There's a huge difference between having a test suite that runs 100 tests, one of which is to try and smash your stack, and manually disassembling and inspecting execution flow for every student.
 
he probably meant the professor ran the code through a debugger

Most of the time, that would suppress errors that would otherwise lead to segfault. Perhaps something like valgrind or application verifier?
 
No self-respecting professor would look at the assembly. If he was worth his salt and good, he would just look at the damn programs source code
 
uh, depending on the class there can be a lot of value on demonstrating how code is compiled and executed and what goes in the registers and so forth. but I guess nobody programs like that anymore nowadays....
 
If you write in Assembly / work with smaller embedded systems then you would care about more about that stuff...

At the school I go to, you started out with C in one class, and another with ASM for a small microcontroller in another. So you really do get to work with exactly what's happening. But when it comes to writing real / large programs using a high-level language like C / C++ is obviously the better route to go. After the first semester we move from C to a mix of C and C++ then after that it's really just writing in C++. Although there isn't a massive difference between C and C++ for the programs we're writing.

Our professors just have a lot of test drivers that stress your program / time it. And then the assistants look through your code for "quality"... depending on the class though... For the graphics class it's more important to get the correct visual output, but they still check the code to make sure your algorithms are correct (2D rasterization.. )

I figure the std::string should have enough functionality for parsing if you spend enough time with it.
 
What you're saying doesn't make any sense, and I'm shocked you believe a prof that told you that. There's a huge difference between having a test suite that runs 100 tests, one of which is to try and smash your stack, and manually disassembling and inspecting execution flow for every student.

Have you taken an assembly code course? We started writing using M68000 then, changed over to C to write more complex stuff, and the compiler translates it back into M68000 so it can run on our embedded boards.
 
I'd handwrite a pushdown automata for the language and call command functions when I pop the end parentheses.
 
I'd handwrite a pushdown automata for the language and call command functions when I pop the end parentheses.

Yeah, I was doing some more reading yesterday about vectors and stacks and think that might work well if when I hit a '(' to put the command in a stack, and then execute and pop the command when I hit a ')'.
 
Back
Top