Thursday, January 04, 2007

Programmer's Almanac: Jan 5, 2007

On this day in 1909 Stephen Cole Kleene, inventor of the regular expression, was born. A mathematician, Kleene layed some of the groundwork of computer science by working on Recursion Theory with Turing, Göedel and others.

From a practical standpoint, programmers can thank Kleene for inventing regular expressions in the 1950's. Regular expressions provide a powerful way of matching strings in computer programs.

Here is a regular expression that will parse any xml (the original source is much more readable):

[^<]+|<(!(--([^-]*-([^-][^-]*-)*->?)?|\[CDATA\[([^]]*]([^]]+])*]+([^]>][^]]*]([^]]+])*]+)*>)?|DOCTYPE([ \n\t\r]+([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*([ \n\t\r]+(([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*|"[^"]*"|'[^']*'))*([ \n\t\r]+)?(\[(<(!(--[^-]*-([^-][^-]*-)*->|[^-]([^]"'><]+|"[^"]*"|'[^']*')*>)|\?([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*(\?>|[\n\r\t ][^?]*\?+([^>?][^?]*\?+)*>))|%([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*;|[ \n\t\r]+)*]([ \n\t\r]+)?)?>?)?)?|\?(([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*(\?>|[\n\r\t ][^?]*\?+([^>?][^?]*\?+)*>)?)?|/(([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*([ \n\t\r]+)?>?)?|(([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*([ \n\t\r]+([A-Za-z_:]|[^\x00-\x7F])([A-Za-z0-9_:.-]|[^\x00-\x7F])*([ \n\t\r]+)?=([ \n\t\r]+)?("[^<"]*"|'[^<']*'))*([ \n\t\r]+)?/?>?)?)

Also today, in 1984, Richard Stallman started the GNU project. He quit his job at MIT to found the Free Software Foundation. From Stallman's efforts we now have (among other things) the gcc compiler, the GPL license, and a free version of Unix utilities running on the Linux kernel.

Stallman quote: "If programmers deserve to be rewarded for creating innovative programs, by the same token they deserve to be punished if they restrict the use of these programs. "

(c) 2007, Jorge Monasterio

No comments: