%PDF- <> %âãÏÓ endobj 2 0 obj <> endobj 3 0 obj <>/ExtGState<>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/Annots[ 28 0 R 29 0 R] /MediaBox[ 0 0 595.5 842.25] /Contents 4 0 R/Group<>/Tabs/S>> endobj ºaâÚÎΞ-ÌE1ÍØÄ÷{òò2ÿ ÛÖ^ÔÀá TÎ{¦?§®¥kuµùÕ5sLOšuY>endobj 2 0 obj<>endobj 2 0 obj<>endobj 2 0 obj<>endobj 2 0 obj<> endobj 2 0 obj<>endobj 2 0 obj<>es 3 0 R>> endobj 2 0 obj<> ox[ 0.000000 0.000000 609.600000 935.600000]/Fi endobj 3 0 obj<> endobj 7 1 obj<>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI]>>/Subtype/Form>> stream
TODO: * Documentation - Finish the reference manual of the API. - Finish the manual describing the syntax and semantics of regexps. - Write a description of the algorithms used. There's already my Master's Thesis, but it's not TRE-specific, and it's a thesis, not an algorithm description. - Write man page for tre regexp syntax. * POSIX required features - Support for collating elements and equivalence classes. This requires some level of integration with libc. * New features - Support for GNU regex extensions. - word boundary syntax [[:<:]] and [[:>:]] - beginning and end of buffer assertions ("\`" and "\'") - is there something else missing? - Better system ABI support for non-glibc systems? - Transposition operation for the approximate matcher? * Extend API - Real-time interface? - design API - return if not finished after a certain amount of work - easy for regexec(), more work for regcomp(). * Optimizations - Make specialized versions of matcher loops for REG_NOSUB. - Find out the longest string that must occur in any match, and search for it first (with a fast Boyer-Moore search, or maybe just strstr). Then match both ways to see if it was part of match. - Some kind of a pessimistic histogram filter might speed up searching for approximate matching. - Optimize tre_tnfa_run_parallel to be faster (swap instead of copying everything? Assembler optimizations?) - Write a benchmark suite to see whan effects different optimizations have in different situations.