If you can't read please download the document
Upload
ben-scholzen
View
9.234
Download
0
Embed Size (px)
DESCRIPTION
A Parser is an integral part when building a Domain Specific Language or file format parser, such as our example usage case: the Ical format. This session will cover the general concept about tokenizing and parsing into a datastructure, as well as going into depth about how to keep the memory footprint and runtime low with the help of a stream-tokenizer.
Citation preview
2. What we'll cover
3. Basic structure of a tokenizer and a parser 4. Where to optimize things for PHP 5. What about parser generators? 6. They are evil!
7. Create lots of function calls like lemon parsers in C 8. Are not working very performance-wise 9. Will eat up all your memory 10. Conclusion
11. Let's get started 12. What a compiler is and how it works
13. Converts human-readable data into machine-readable data 14. Consists of a two components:
15. Reads the input stream 16. Clears up the input data 17. Creates a list of tokens The parser:
18. Converts them into a data structure 19. What a compiler is and how it works Lexer Parser TokensDocument Stream Structure 20. Sounds great, but where do I need it?
21. Wiki-Codes Description languages
22. XML Even programming languages
23. PHP Anything else you want your program to understand 24. The lexer (or tokenizer) 25. What are tokens?
26. Corresponding block of text (lexeme) List of tokens represents an entire document 27. Example in PHP:$value = 5 * 7 ; 28. How the tokenizer works
29. Tokenize the input in a loop
30. Reading char-by-char is too slow 31. Use the offset parameter 32. Use the G assertion (^ won't work) Always store the current position 33. Use either a switch-statement or a structured arrayReturn the tokens 34. What we can optimize
35. Requires previous knowledge about when tokens end Offer a method for the parser to get a partial bunch of tokens Speed up execution-time
36. Going into practice 37. The beginning
38. Offer a method for the parser to get a partial bunch of tokens Speed up execution-time Do no internal function-calls if applicable 39. Throwing in a file 40. Preparing stuff 41. Base state 42. Operator state 43. Value state 44. Rounding it up 45. Some actual testing 46. And what we get
47. [0]=> 48. array(2) { 49. [0]=> 50. string(8) "variable" 51. [1]=> 52. string(6) "$value" 53. } 54. [1]=> 55. array(2) { 56. [0]=> 57. string(8) "operator" 58. [1]=> 59. string(1) "=" 60. } 61. [2]=> 62. array(2) { 63. [0]=> 64. string(6) "number" 65. [1]=> 66. string(1) "5" 67. }
68. array(2) { 69. [0]=> 70. string(8) "operator" 71. [1]=> 72. string(1) "*" 73. } 74. [4]=> 75. array(2) { 76. [0]=> 77. string(6) "number" 78. [1]=> 79. string(1) "7" 80. } 81. [5]=> 82. array(2) { 83. [0]=> 84. string(8) "operator" 85. [1]=> 86. string(1) ";" 87. } 88. } 89. The parser 90. So we have a bunch of tokens, what now?
91. Create an object-oriented tree-structure or interpret 92. Avoid non-tail recursion
93. Saves you from hitting the stack limit That's it! 94. Summary Questions? 95. Where to go from here
96. About tail-recursion in PHP: http://www.alternateinterior.com/2006/09/tail-recursion-in-php.html 97. My blog: http://www.dasprids.de 98. Rate this talk: http://joind.in/635 99. Follow me on twitter: 100. http://www.twitter.com/dasprid 101. Thank you!