<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>beta BLOG dot NET - recently in Perl category</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/perl/" />
  <link rel="self" type="application/atom+xml" href="" />
  <id>tag:beta-blog.net,2009-08-27://1</id>
  <updated>2010-08-14T14:32:12Z</updated>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.25</generator>

<entry>
  <title>Perl module of the day (1)</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2010/08/perl-module-of-the-day-1" />
  <id>tag:beta-blog.net,2010://1.52392</id>

  <published>2010-08-12T23:34:49Z</published>
  <updated>2010-08-14T14:32:12Z</updated>

  <summary>Control where you go when you die() ...</summary>
  <author>
    <name>Sebastian</name>
    <uri>http://beta-blog.net</uri>
  </author>
  
  <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="__meta" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<p>
Control where you go when you <code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/die.html" target="_blank" rel="help,nofollow">die</a>()</code> ... 
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_f5aqhcn9_1">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl"><span class="cmnt">#!/usr/bin/perl</span>
<a class="kwd" href="http://perldoc.perl.org/functions/use.html" target="_blank" rel="help,nofollow">use</a> <a class="pkg" href="http://search.cpan.org/search?query=Religion&amp;mode=module" target="_blank" rel="help,nofollow" title="Module Religion">Religion</a><span class="op stmt">;</span>

<span class="var">$<span class="symb">Die</span>::<span class="symb">Handler</span></span> = <span class="symb">new</span> <span class="symb">DieHandler</span>  <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="help,nofollow">sub</a> <span class="op ld">{</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/die.html" target="_blank" rel="help,nofollow">die</a> <span class="qlo q"><span class="kwd">q</span><span class="op">/</span><span class="str">Goodbye, cruel world</span><span class="op">/</span></span><span class="op stmt">;</span>
<span class="op rd">}</span><span class="op stmt">;</span>

<span class="symb">__END__</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_f5aqhcn9_1')})/*]]&gt;*/</script>
<p>
Well, the <a href="http://search.cpan.org/perldoc?Religion" target="_blank">Religion module</a> will celebrate it's 15th birthday this year!  *dance*
</p>]]>
  
  </content>
</entry>

<entry>
  <title>getting rid of MT&apos;s permalink file extensions</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2010/02/getting-rid-of-mts-permalink-file-extensions" />
  <id>tag:beta-blog.net,2010://1.52387</id>

  <published>2010-02-14T00:07:42Z</published>
  <updated>2010-02-17T22:47:15Z</updated>

  <summary>While designing scalable and portable web projects, it&apos;s always a good idea to design hyperlinks independently of the physical file path from where the respective request should be served then.</summary>
  <author>
    <name>admin1</name>
    
  </author>
  
  <category term="MT hacks" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="http" label="http" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="mt" label="mt" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<p>
While designing scalable and portable web projects, it's always a good idea to design hyperlinks independently of the physical file path from where the respective request should be served then.
For instance, I usually have permalinks looking like this one:
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_hfzyedtt_1">[-] hide code</a></legend><div class="collapsible-container"><pre>
<a href="http://beta-blog.net/2010/02/getting-rid-of-mts-permalink-file-extensions">http://beta-blog.net/2010/02/getting-rid-of-mts-permalink-file-extensions</a>
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_hfzyedtt_1')})/*]]&gt;*/</script>
Thus I can decide later on whether I prefer to serve pages as html, shtml, php, or whatsoever - without losing the permanentness of my permalink 
Now, while I have choosen <a target="_blank" href="http://www.movabletype.org/documentation/administrator/publishing/static-and-dynamic-publishing.html">static publishing</a> with HTML extension, MT creates a file such as
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_hfzyedtt_2">[-] hide code</a></legend><div class="collapsible-container"><pre>
/var/www/beta-blog.net/2010/02/getting-rid-of-mts-permalink-file-extensions.html
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_hfzyedtt_2')})/*]]&gt;*/</script>
so I can tell Apache how it should map my permalink to the physical file path using
<a target="_blank" href="http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html">mod_rewrite</a>:
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_hfzyedtt_3">[-] hide code</a></legend><div class="collapsible-container"><pre>
# DocumentRoot /var/www/beta-blog.net
RewriteEngine on
RewriteBase /
RewriteRule ^([^\.]+[^\/])$ $1.html [L]
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_hfzyedtt_3')})/*]]&gt;*/</script>
<p>
That is, any request path not containing a dot and not ending with a trailing slash will be rewritten to the according HTML file. (Yes, that way I won't be able to serve permalinks containing dots, but I can live with that ;)
</p>
<p>
Now that's nice, but unfortunately, within MT you are not able to configure the looks of permalinks inedpendently from the file path.
Nobody wants to remove the extension from the actual file, since the same directory might contain some .js, .jpg and other files mapped to their own content types as well. (And I have no idea how to set up an <a target="_blank" href="http://httpd.apache.org/docs/2.0/mod/mod_mime.html#addhandler">AddHandler directive</a> exclusively respecting files without extension.) Therefore, the solution is to let MT add the configured extension to the actual file, but remove the extension from the permalink.
</p>
<p>
A blessing in disguise, one has to hack MT's source code itself, but it's a quite simple change to the <code>archive_url</code> method within MT's Entry module. Thus, applying the following patch to <code>lib/MT/Entry.pm</code> will remove .php/.html extensions from archive links:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_hfzyedtt_4">[-] hide code</a></legend><div class="collapsible-container"><pre>
--- MTOS-5.01-en.orig/lib/MT/Entry.pm   2010-02-14 01:17:24.000000000 +0100
+++ MTOS-5.01-en/lib/MT/Entry.pm        2010-02-14 01:58:58.000000000 +0100
@@ -541,7 +541,11 @@
     my $blog = $entry->blog() || return;
     my $url = $blog->archive_url || "";
     $url .= '/' unless $url =~ m!/$!;
-    $url . $entry->archive_file(@_);
+    #$url . $entry->archive_file(@_);
+    ## --&gt; HACK: remove .php/.html extensions &lt;-- ##
+    my $f = $entry->archive_file(@_);
+    $f =~ s/\.(php|html)$//;
+    $url . $f;
 }

 sub permalink {
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_hfzyedtt_4')})/*]]&gt;*/</script>
<p>
Well, thats really a dirty hack and things will get more complicated with dynamic publishing. So, hopefully one day MT's developers will implement a cleaner solution and make it configurable through the webinterface. *hack*
</p>

]]>
  
  </content>
</entry>

<entry>
  <title>AutoSmileys:-) Plugin for Movable Type</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2009/12/autosmileys-plugin-for-movable-type" />
  <id>tag:beta-blog.net,2009://1.52385</id>

  <published>2009-12-06T16:13:41Z</published>
  <updated>2009-12-15T21:00:07Z</updated>

  <summary>Freely configurable automatic raplacement of textual emoticons by image tags. AutoSmileys is an easy to use and highly customizable macro environment for Movable Type. It will replace self-defined text abbreviations by image tags when your site is published or dynamically rendered, respectively. It may be applied either within entries, comments, pages, or any other part of your site.</summary>
  <author>
    <name>Sebastian</name>
    <uri>http://beta-blog.net</uri>
  </author>
  
  <category term="MT hacks" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="mt" label="mt" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="regex" label="regex" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<h2>Freely configurable automatic raplacement of textual emoticons by image tags.</h2>

<h3>Description</h3>
<p>
<em>AutoSmileys</em> is an easy to use and highly customizable macro environment
for Movable Type. It will replace self-defined text	abbreviations by image tags
when your site is published or dynamically rendered, respectively. It may be applied
either within entries, comments, pages, or any other part of your site.
</p>
<p>
<em>AutoSmileys</em> works with both static and dynamic publishing and has been
tested with MT 4.2, MT 4.3, and MT 5.0 beta (Perl 5.8.8 / PHP 5.2.9).
(Dynamic publishing with AutoSmileys requires PHP5 in either case.)
</p>

<h3>Download</h3>
<p>
<a href="http://source.beta-blog.net/autosmileys/0.9/AutoSmileys-0.9.zip">http://source.beta-blog.net/autosmileys/0.9/AutoSmileys-0.9.zip</a>
</p>

<h3>Installation</h3>
<ol>
<li>
Download <em>AutoSmileys-0.9.zip</em>, unzip it and copy the <em>AutoSmileys</em>
directory from the zip file into the plugins directory of your Movable Type installation.
</li>
<li>
Sign in to your Movable Type CMS.
</li>
<li>
For each blog you wish to use AutoSmileys in, select <em>Design &gt; Templates</em> from the menu and open
the desired template for editing.
</li>
<li>
Within each of these templates, enclose the desired parts with the
<span class="code">&lt;mt:AutoSmileys&gt; ... &lt;/mt:AutoSmileys&gt;</span> block tag.
</li>
<li>
Republish your site.
</li>
</ol>
<p>
So for instance, if you wish to have <em>AutoSmileys</em> replacement on your entry main body,
edit the <em>Entry</em> template as follows:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_afkup0br_1">[-] hide code</a></legend><div class="collapsible-container"><p style="text-align:center">
<img alt="settings1.jpg" src="http://beta-blog.net/2009/12/06/template1.jpg" style="width:477px;height:275px;" />
</p></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_afkup0br_1')})/*]]&gt;*/</script>
<p>
Anyway you may also put the whole HTML body inside the <em>AutoSmileys</em> tag.
This would also apply <em>AutoSmileys</em> to comments, since the <em>Comments</em>
module template is included into the <em>Entry</em> template.
</p>
<p>
For general installation instructions concerning Movable Type, see
<a href="http://www.movabletype.org/documentation/installation/" target="_blank">http://www.movabletype.org/documentation/installation/</a>.
</p>

<h3>Configuration</h3>
<p>
Once you have installed <em>AutoSmileys</em>, sign in to your Movable Type CMS and open
the <em>Tools &gt; Plugins</em> page. Whithin the <em>AutoSmileys 0.9</em> panel, expand the
<em>Settings</em> tab. By default, it will look as follows:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_afkup0br_2">[-] hide code</a></legend><div class="collapsible-container"><p style="text-align:center">
<img alt="settings1.jpg" src="http://beta-blog.net/2009/12/06/settings1.jpg" style="width:588px;height:701px;" />
</p></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_afkup0br_2')})/*]]&gt;*/</script>
<p>
On the top of the form you may define HTML tag names whose content will be ignored
in order to avoid inappropriate image tag placement.
The list beneath consists of text abbreviations (tokens)
and associated image source URLs.
</p>
<p>
Default smileys are taken from
<a href="http://www.freesmileys.org" target="_blank">freesmileys.org</a>. *thx*
</p>
<p>
You may change these entries as you like and you may also expand or shorten the list
by clicking on the <span class="code">[+]</span> and <span clas="code">[-]</span>
links on the right hand side (JavaScript required).
</p>
<p>
The list may be expanded with up to 40 rows by default.
You may increase the related parameter
<span class="code">maxRows</span>
by editing <span class="code">AutoSmileys.pl</span>. *hack*
</p>

<h3>Remarks</h3>
<p>
Remember that MT tags are applied after any text filter you may have set up for
entries or comments.
Hence, text filters such as <em>Textile</em> and <em>Markdown</em> would
replace stuff like <span class="code">&#42;lol&#42;</span> before
<em>AutoSmileys</em> has a chance to see it.
</p>
<p>
Also, note that tokens are recognized only in the way you have defined them
after standard HTML entity replacement has been applied.
That is, <span class="code">&amp;#58;&amp;#45;&amp;#41;</span>
will be displayed as <span class="code">&#58;&#45;&#41;</span> but not replaced.
</p>

<h3>How it works</h3>
<p>
The heart of <em>AutoSmileys</em> is a regular expression, dynamically created from the
mappings of tokens to URLs defined as above. Once having these mappings collected within a Perl hash
called <span class="code"><code class="perl"><span class="var">%<span class="symb">mappings</span></span></code></span>, it's built as follows:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_afkup0br_3">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">re_pattern</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/join.html" target="_blank" rel="nofollow">join</a> <span class="str">&#039;|&#039;</span>,
  <span class="qlo q"><span class="kwd">q</span><span class="op">/</span><span class="str">&lt;(!)--(?:.|\n)*?--&gt;</span><span class="op">/</span></span>,          <span class="cmnt"># markup comments</span>
  <span class="qlo q"><span class="kwd">q</span><span class="op">/</span><span class="str">&lt;([^\s&gt;]+)[^&gt;]*?(\/)?\s*&gt;</span><span class="op">/</span></span>,    <span class="cmnt"># markup tags</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/map.html" target="_blank" rel="nofollow">map</a><span class="op ld">(</span><a class="kwd" href="http://perldoc.perl.org/functions/quotemeta.html" target="_blank" rel="nofollow">quotemeta</a>, <a class="kwd" href="http://perldoc.perl.org/functions/keys.html" target="_blank" rel="nofollow">keys</a> <span class="var">%<span class="symb">mappings</span></span><span class="op rd">)</span><span class="op stmt">;</span>  <span class="cmnt"># smileys</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_afkup0br_3')})/*]]&gt;*/</script>
<p>
On a reasonably valid markup code this will replace any tokens outside
markup tags, especially ignoring the content of specified tags
inapplicable for containing image tags.
Then,
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_afkup0br_4">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl"><span class="symb">s</span>/<span class="op ld">(</span><span class="var">$<span class="symb">re_pattern</span></span><span class="op rd">)</span>/&amp;<span class="symb">re_callback</span><span class="op ld">(</span><span class="var">$1</span>,<span class="var">$2</span>,<span class="var">$3</span>,<span class="var">$4</span><span class="op rd">)</span>/<span class="symb">eg</span><span class="op stmt">;</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_afkup0br_4')})/*]]&gt;*/</script>
<p>
does the job, using the following callback function:
</p>

<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_afkup0br_5">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">expect</span></span><span class="op stmt">;</span>
<a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="symb">re_callback</span>
<span class="op ld">{</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">match</span></span>, <span class="var">$<span class="symb">comment</span></span>, <span class="var">$<span class="symb">tagname</span></span>, <span class="var">$<span class="symb">selfclosed</span></span><span class="op rd">)</span> = <span class="var">@_</span><span class="op stmt">;</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$<span class="symb">match</span></span> <span class="kwd">if</span> <span class="var">$<span class="symb">comment</span></span> || <span class="var">$<span class="symb">selfclosed</span></span><span class="op stmt">;</span>
  <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">tagname</span></span> <span class="op rd">)</span> <span class="cmnt"># non-self-closing markup tag</span>
  <span class="op ld">{</span>
    <span class="var">$<span class="symb">tagname</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/lc.html" target="_blank" rel="nofollow">lc</a> <span class="var">$<span class="symb">tagname</span></span><span class="op stmt">;</span> <span class="cmnt"># ignore case</span>
    <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">expect</span></span> <span class="op rd">)</span>  <span class="cmnt"># within ignorance state</span>
    <span class="op ld">{</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/undef.html" target="_blank" rel="nofollow">undef</a> <span class="var">$<span class="symb">expect</span></span> <span class="kwd">if</span> <span class="var">$<span class="symb">tagname</span></span> <a class="kwd" href="http://perldoc.perl.org/functions/eq.html" target="_blank" rel="nofollow">eq</a> <span class="var">$<span class="symb">expect</span></span><span class="op stmt">;</span> <span class="cmnt"># end ignorance state</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$<span class="symb">match</span></span><span class="op stmt">;</span>
    <span class="op rd">}</span>
    <span class="var">$<span class="symb">expect</span></span> = <span class="qlo qq"><span class="kwd">qq</span><span class="op">&lt;</span><span class="istr">/<span class="var">$<span class="symb">tagname</span></span></span><span class="op">&gt;</span></span>
      <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/exists.html" target="_blank" rel="nofollow">exists</a> <span class="var">$<span class="symb">ignoretags</span><span class="op ld">{</span><span class="var">$<span class="symb">tagname</span></span><span class="op rd">}</span></span><span class="op stmt">;</span> <span class="cmnt"># begin ignorance state</span>
    <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$<span class="symb">match</span></span><span class="op stmt">;</span>
  <span class="op rd">}</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$<span class="symb">match</span></span> <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">expect</span></span> || !<span class="op ld">(</span><a class="kwd" href="http://perldoc.perl.org/functions/exists.html" target="_blank" rel="nofollow">exists</a> <span class="var">$<span class="symb">mappings</span><span class="op ld">{</span><span class="var">$<span class="symb">match</span></span><span class="op rd">}</span></span><span class="op rd">)</span><span class="op stmt">;</span>
  &amp;<span class="op ld">{</span><span class="var">$<span class="symb">Defaults</span><span class="op ptr">-&gt;</span><span class="op ld">{</span><span class="str">image_tag</span><span class="op rd">}</span></span><span class="op rd">}</span><span class="op ld">(</span><span class="var">$<span class="symb">mappings</span><span class="op ld">{</span><span class="var">$<span class="symb">match</span></span><span class="op rd">}</span></span>, <span class="var">$<span class="symb">match</span></span><span class="op rd">)</span><span class="op stmt">;</span>
<span class="op rd">}</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_afkup0br_5')})/*]]&gt;*/</script>
<p>
Here, <span class="code"><code clas="perl"><span class="var">$<span class="symb">Defaults</span><span class="op ptr">-&gt;</span><span class="op ld">{</span><span class="str">image_tag</span><span class="op rd">}</span></span></code></span>
is the actual image tag function.
That's it 8-).
</p>

]]>
  
  </content>
</entry>

<entry>
  <title>a wordlist folding algorithm</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2009/11/a-wordlist-folding-algorithm" />
  <id>tag:beta-blog.net,2009://1.52384</id>

  <published>2009-11-28T23:33:32Z</published>
  <updated>2009-11-29T16:22:17Z</updated>

  <summary>Assumed you wish to match a large wordlist against a huge chunk of text. As a small test case, let
for, far, bar, foo, boofaz, boofar, boof, faz, foobaz, foobars, boofar
be your wordlist. Now, you may apply the according regualar expression:
But which way a regex engine would implement the assignment?
</summary>
  <author>
    <name>Sebastian</name>
    <uri>http://beta-blog.net</uri>
  </author>
  
  <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="algorithms" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="codes" label="codes" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="regex" label="regex" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<p>
Assumed you wish to match a large wordlist against a huge chunk of text.
As a small test case, let
</p>
<pre class="code">
for, far, bar, foo, boofaz, boofar, boof, faz, foobaz, foobars, boofar
</pre>
<p>
be your wordlist. Now, you may apply the according regualar expression:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_ynkfdog0_1">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
(1) /\b(for|far|bar|foo|boofaz|boofar|boof|faz|foobaz|foobars|boofar)\b/
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_ynkfdog0_1')})/*]]&gt;*/</script>
<p>
But which way a regex engine would implement the assignment?
There are different options. The very worst algorithm would be surely to
look up every word separately in the whole text. That would be the same as
doing
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_ynkfdog0_2">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/foreach.html" target="_blank" rel="nofollow">foreach</a> <span class="op ld">(</span><span class="qlo qw"><span class="kwd">qw</span><span class="op">(</span><span class="istr"> for far bar foo boofaz boofar boof faz foobaz foobars boofar </span><span class="op">)</span></span><span class="op rd">)</span>
<span class="op ld">{</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="istr">&quot;matching!&quot;</span> <span class="kwd">if</span> <span class="var">$<span class="symb">text</span></span> =~ <span class="symb">m</span>/\<span class="symb">b</span><span class="var">$_</span>\<span class="symb">b</span>/<span class="op stmt">;</span>
<span class="op rd">}</span>
<a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="istr">&quot;not matching.&quot;</span><span class="op stmt">;</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_ynkfdog0_2')})/*]]&gt;*/</script>
<p>
Assumed you would match <span class="math">m</span> words against a text consisting of <span class="math">n</span> letters,
this peace of coding horror would have a runtime estimation of <span class="math">O(m*n)</span>.
</p>

<p>
Now, a better approach would be to run only once through the text,
using a matching stack. Thus, assume <span class="code">&quot; foobar &quot;</span> would appear somewhere in
the text, the stack trace might look as follows then (read from bottom to top):
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_ynkfdog0_3">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
[7] ' ' =&gt; nothing matches.
[6] 'r' =&gt; &quot;foobars&quot; might match.
[5] 'a' =&gt; &quot;foobaz&quot; or &quot;foobars&quot; might match.
[4] 'b' =&gt; &quot;foobaz&quot; or &quot;foobars&quot; might match.
[3] 'o' =&gt; &quot;foo&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[2] 'o' =&gt; &quot;for&quot;, &quot;foo&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[1] 'f' =&gt; &quot;for&quot;, &quot;far&quot;, &quot;foo&quot;, &quot;faz&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[0] ' ' =&gt; &quot;\b&quot; matches.
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_ynkfdog0_3')})/*]]&gt;*/</script>
<p>
So, but what if the wordlist is getting large? It seems that we should run nearly
through the whole list each time a character is pushed onto the stack in order to
find out whether the current stack contents still may be matched or not.
</p>

<p>
It's clear that a considerable optimization would be to sort the word list
in advance. Moreover, instead of looking up one item after another,
a really smart approach would be to walk downwards a search tree instead.
As a tree, the wordlist above would appear like this:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_ynkfdog0_4">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
          _____________|_____________
          |                         |
          b                         f
    ______|______        ___________|___________
    |           |        |                     |
   oof          ar       a                     o
    |                 ___|___            ______|______
    a ?               |     |            |           |
 ___|___              r     z            o           r
 |     |                                 |
 r     z                                 ba ?
                                     ____|____
                                     |       |
                                     rs      z
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_ynkfdog0_4')})/*]]&gt;*/</script>
<p>
Here, the &quot;?&quot; denotes an optional node. Remember the length of the way downwards
such a tree is in logarithmic relation to the number of nodes. Thus, loosely speeking,
we have improved the worst algorithm above up to <span class="math">O(n*log(m))</span> at least.
</p>
<p>
Actually I'm not sure whether regex engines would apply optimizations like that
when compiling. I guess they do, so it might be needless to replace the regex <span class="code">(1)</span> above
by the optimized version, implementing the sorted tree of alternative and optional nodes:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_ynkfdog0_5">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
(2) /\b(b(?:ar|oof(?:a(?:r|z))?)|f(?:a(?:r|z)|o(?:o(?:ba(?:rs|z))?|r)))\b/
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_ynkfdog0_5')})/*]]&gt;*/</script>
<p>
Nevertheless I couldn't help to create a little Perl routine that folds a wordlist into an
optimized regex. Now, here it is:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_ynkfdog0_6">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="symb">foldWordsToRegex</span> <span class="op ld">{</span>

  <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">toString</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="op ld">{</span>
    <span class="cmnt">## node: [ prefix, [ nodes ], opt ]</span>

    <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">prefix</span></span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">)</span> = <span class="var">$<span class="op ld">{</span><span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op rd">}</span></span><span class="op stmt">;</span>
    <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">rv</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/quotemeta.html" target="_blank" rel="nofollow">quotemeta</a> <span class="var">$<span class="symb">prefix</span></span><span class="op stmt">;</span>
    <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/ref.html" target="_blank" rel="nofollow">ref</a> <span class="var">$<span class="symb">nodes</span></span> <span class="symb">eq</span> <span class="qlo q"><span class="kwd">q</span><span class="op">|</span><span class="str">ARRAY</span><span class="op">|</span></span> &amp;&amp; <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
    <span class="op ld">{</span>
      <span class="var">$<span class="symb">rv</span></span> .= <span class="str">&#039;(?:&#039;</span>.<span class="op ld">(</span><a class="kwd" href="http://perldoc.perl.org/functions/join.html" target="_blank" rel="nofollow">join</a> <span class="str">&#039;|&#039;</span>, <a class="kwd" href="http://perldoc.perl.org/functions/map.html" target="_blank" rel="nofollow">map</a> <span class="op ld">{</span> <span class="symb">toString</span><span class="op ld">(</span><span class="var">$_</span><span class="op rd">)</span> <span class="op rd">}</span> <span class="var">@$<span class="symb">nodes</span></span><span class="op rd">)</span>.<span class="str">&#039;)&#039;</span><span class="op stmt">;</span>
      <span class="var">$<span class="symb">rv</span></span> .= <span class="str">&#039;?&#039;</span> <span class="kwd">if</span> <span class="var">$<span class="symb">opt</span></span><span class="op stmt">;</span>
    <span class="op rd">}</span>
    <span class="var">$<span class="symb">rv</span></span><span class="op stmt">;</span>
  <span class="op rd">}</span><span class="op stmt">;</span>

  <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">fold</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a><span class="op ld">(</span><span class="var">@_</span><span class="op rd">)</span> <span class="op ld">{</span>

    <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="symb">reduce</span><a class="o" href="o" target="_blank" rel="nofollow">(</a><a class="p" href="p" target="_blank" rel="nofollow">$</a><a class="o" href="o" target="_blank" rel="nofollow">)</a><span class="op stmt">;</span>
    <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">reduce</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="op ld">{</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">prefix</span></span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">)</span> = <span class="var">$<span class="op ld">{</span><span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op rd">}</span></span><span class="op stmt">;</span>

      <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="kwd">unless</span> <a class="kwd" href="http://perldoc.perl.org/functions/ref.html" target="_blank" rel="nofollow">ref</a> <span class="var">$<span class="symb">nodes</span></span> <span class="symb">eq</span> <span class="qlo q"><span class="kwd">q</span><span class="op">|</span><span class="str">ARRAY</span><span class="op">|</span></span> &amp;&amp; <span class="var">@$<span class="symb">nodes</span></span> &gt; <span class="num">1</span><span class="op stmt">;</span>

      <span class="cmnt">## 1st char of the prefix of 1st node in list</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">c</span></span>, <span class="var">$<span class="symb">qc</span></span><span class="op rd">)</span><span class="op stmt">;</span>

      <span class="cmnt">## check whether 2nd prefix starts with same letter as the 1st</span>
      <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <span class="var">$<span class="symb">c</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/substr.html" target="_blank" rel="nofollow">substr</a> <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span>, <span class="num">0</span>, <span class="num">1</span><span class="op stmt">;</span>
        <span class="var">$<span class="symb">qc</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/quotemeta.html" target="_blank" rel="nofollow">quotemeta</a> <span class="var">$<span class="symb">c</span></span><span class="op stmt">;</span>
        <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> =~ <span class="symb">m</span>/^<span class="var">$<span class="symb">qc</span></span>/ <span class="kwd">or</span> <a class="kwd" href="http://perldoc.perl.org/functions/undef.html" target="_blank" rel="nofollow">undef</a> <span class="var">$<span class="symb">c</span></span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="kwd">unless</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">c</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="kwd">unless</span> <span class="var">@$<span class="symb">nodes</span></span> &gt; <span class="num">2</span><span class="op stmt">;</span>

        <span class="cmnt">## try to reduce next list part</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">first</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/shift.html" target="_blank" rel="nofollow">shift</a> <span class="var">@$<span class="symb">nodes</span></span><span class="op stmt">;</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">next</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span><span class="str">&#039;&#039;</span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">first</span></span>, <span class="var">$<span class="symb">next</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">]</span> <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op stmt">;</span>

        <span class="cmnt">## couldn&#039;t be reduced</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">first</span></span>, <span class="var">$<span class="op ld">{</span><span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span></span><span class="op rd">}</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="cmnt">## reduce any ensuing node whose prefix starts with $c</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">@<span class="symb">new</span></span><span class="op stmt">;</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">newopt</span></span> = <span class="num">0</span><span class="op stmt">;</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/while.html" target="_blank" rel="nofollow">while</a> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> =~ <span class="symb">s</span>/^<span class="var">$<span class="symb">qc</span></span>// <span class="kwd">or</span> <a class="kwd" href="http://perldoc.perl.org/functions/last.html" target="_blank" rel="nofollow">last</a><span class="op stmt">;</span>

        <span class="cmnt">## reduce node or detect new optional node</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">n</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/shift.html" target="_blank" rel="nofollow">shift</a> <span class="var">@$<span class="symb">nodes</span></span><span class="op stmt">;</span>
        <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">n</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="op rd">)</span>
        <span class="op ld">{</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/push.html" target="_blank" rel="nofollow">push</a> <span class="var">@<span class="symb">new</span></span>, <span class="var">$<span class="symb">n</span></span><span class="op stmt">;</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/next.html" target="_blank" rel="nofollow">next</a><span class="op stmt">;</span>
        <span class="op rd">}</span>
        <span class="var">$<span class="symb">newopt</span></span> = <span class="num">1</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="kwd">if</span> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> || <span class="var">$<span class="symb">opt</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">new</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span> <span class="var">$<span class="symb">c</span></span>, <span class="op ld">[</span> <span class="var">@<span class="symb">new</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">newopt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
        <span class="kwd">if</span> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
        <span class="op ld">{</span>
          <span class="cmnt">## reduce remaining nodes</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">next</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span><span class="str">&#039;&#039;</span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">$<span class="symb">next</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">]</span> <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op stmt">;</span>

          <span class="cmnt">## couldn&#039;t be reduced</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">$<span class="op ld">{</span><span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span></span><span class="op rd">}</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
        <span class="op rd">}</span>

        <span class="cmnt">## current node is optional</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="cmnt">## nothing left to reduce</span>
      <span class="symb">reduce</span> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>.<span class="var">$<span class="symb">c</span></span>, <span class="op ld">[</span> <span class="var">@<span class="symb">new</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">newopt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
    <span class="op rd">}</span><span class="op stmt">;</span>

    <span class="symb">reduce</span> <span class="op ld">[</span> <span class="str">&#039;&#039;</span>, <span class="op ld">[</span><span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/map.html" target="_blank" rel="nofollow">map</a> <span class="op ld">{</span> <span class="op ld">[</span><span class="var">$_</span><span class="op rd">]</span> <span class="op rd">}</span> <a class="kwd" href="http://perldoc.perl.org/functions/sort.html" target="_blank" rel="nofollow">sort</a> <span class="var">@_</span> <span class="op rd">)</span><span class="op rd">]</span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
  <span class="op rd">}</span><span class="op stmt">;</span>

  <span class="symb">toString</span><span class="op ld">(</span><span class="symb">fold</span><span class="op ld">(</span><span class="var">@_</span><span class="op rd">)</span><span class="op rd">)</span><span class="op stmt">;</span>
<span class="op rd">}</span><span class="op stmt">;</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_ynkfdog0_6')})/*]]&gt;*/</script>
<p>
Well, not so easy, but it works :)
</p>
<p>
Here, the inner recursion <span class="code">fold</span> will create the actually tree, where nodes
having the form of arrays consisting of prefix, subnodes and a flag denoting optional nodes.
The second inner function <span class="code">toString</span> then creates the actual regular
expression string from that tree.
So, for instance, calling
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_ynkfdog0_7">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl">&amp;<span class="symb">foldWordsToRegex</span><span class="op ld">(</span><span class="qlo qw"><span class="kwd">qw</span><span class="op">(</span><span class="istr"> for far bar foo boofaz boofar boof faz foobaz foobars boofar </span><span class="op">)</span></span><span class="op rd">)</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_ynkfdog0_7')})/*]]&gt;*/</script>
<p>
would return the regex <span class="code">(2)</span>.
</p>
]]>
  
  </content>
</entry>

<entry>
  <title>shorten by regex</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2009/10/shorten-by-regex" />
  <id>tag:beta-blog.net,2009://1.52378</id>

  <published>2009-10-15T20:35:50Z</published>
  <updated>2009-11-29T10:39:33Z</updated>

  <summary>While customizing my mt blog, I was wondering how to abbreviate long entry titles on particular places in a nice way. Well, mt provides template tag modifiers such as trim-to, but my aim was to do it  more nicely, i.e. replacing anything behind the first three (or any other maximum) words by an ellipsis.</summary>
  <author>
    <name>admin1</name>
    
  </author>
  
  <category term="MT hacks" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="coding" label="coding" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="mt" label="mt" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="regex" label="regex" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<p>
While customizing my <a href="http://beta-blog.net">mt blog</a>, I was wondering how to abbreviate long entry titles on particular places in a nice way. Well, mt provides template tag modifiers such as <a rel="nofollow" target="_blank" href="http://www.movabletype.org/documentation/appendices/modifiers/trim-to.html">trim-to</a>, but my aim was to do it  more nicely, i.e. replacing anything behind the first three (or any other maximum) words by an ellipsis. For instance,
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_owenvtwn_1">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
&quot;more than three words&quot;
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_owenvtwn_1')})/*]]&gt;*/</script>
should be shortened to 
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_owenvtwn_2">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
&quot;more than three ...&quot;
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_owenvtwn_2')})/*]]&gt;*/</script>
while a three-word-sentence should remain as it is.
</p>

<p>
The solution is a simple regex, of course. In Perl style,
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_owenvtwn_3">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
s/&circ;(\S+(?:\s+\S+){2})(\s+\S+)+/$1 .../
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_owenvtwn_3')})/*]]&gt;*/</script>
does the job, because it doesn't match on three words or less. Equivalently, within mt template tags:
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_owenvtwn_4">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
&lt;$mt:EntryTitle regex_replace="/&circ;(\S+(?:\s+\S+){2})(\s+\S+)+/","$1&amp;ensp;&amp;hellip;"$&gt;
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_owenvtwn_4')})/*]]&gt;*/</script>
</p>
]]>
  
  </content>
</entry>

<entry>
  <title>utilizing mt context column values</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2009/08/utilizing-mt-context-column-values" />
  <id>tag:mt.beta-blog.net,2009://1.52375</id>

  <published>2009-08-29T15:56:01Z</published>
  <updated>2009-11-29T10:41:27Z</updated>

  <summary>Within a Movable Type plugin routine you may access current context column values through the context stash.
</summary>
  <author>
    <name>Sebastian</name>
    <uri>http://beta-blog.net</uri>
  </author>
  
  <category term="MT hacks" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="mt" label="mt" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[

Within Movable Type plugin methods you can access current context column values through the context stash.

See <a rel="nofollow" target="_blank" href="http://www.movabletype.org/documentation/developer/the-template-context.html">http://www.movabletype.org/documentation/developer/the-template-context.html</a>
for a rudimental introduction to the concept of template context.

<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_wm01odd_1">[-] hide code</a></legend><div class="collapsible-container"><pre class="code perl1"><span class="perl1 kwa">sub</span> handler <span class="perl1 sym">{</span>
  <span class="perl1 kwc">my</span> <span class="perl1 sym">(</span><span class="perl1 kwb">$ctx</span><span class="perl1 sym">,</span> <span class="perl1 kwb">$args</span><span class="perl1 sym">,</span> <span class="perl1 kwb">$cond</span><span class="perl1 sym">) =</span> <span class="perl1 kwb">&#64;_</span><span class="perl1 sym">;</span>

  <span class="perl1 kwa">if</span> <span class="perl1 sym">(</span> <span class="perl1 kwc">my</span> <span class="perl1 kwb">$entry_stash</span> <span class="perl1 sym">=</span> <span class="perl1 kwb">$ctx</span><span class="perl1 sym">-&gt;</span><span class="perl1 kwd">stash</span><span class="perl1 sym">(</span><span class="perl1 str">'entry'</span><span class="perl1 sym">) )</span>
  <span class="perl1 sym">{</span>
    <span class="perl1 slc"># access the accordant mt_entry row, where keys are</span>
    <span class="perl1 slc"># column names w/o prefix, i.e. id, bog_id, etc.</span>
    <span class="perl1 kwc">my</span> <span class="perl1 kwb">$entry</span> <span class="perl1 sym">=</span> <span class="perl1 kwb">$entry_stash</span><span class="perl1 sym">-&gt;{</span>column_values<span class="perl1 sym">};</span>

    <span class="perl1 kwa">if</span> <span class="perl1 sym">(</span> <span class="perl1 kwc">my</span> <span class="perl1 kwb">$comment_stash</span> <span class="perl1 sym">=</span> <span class="perl1 kwb">$ctx</span><span class="perl1 sym">-&gt;</span><span class="perl1 kwd">stash</span><span class="perl1 sym">(</span><span class="perl1 str">'comment'</span><span class="perl1 sym">) )</span>
    <span class="perl1 sym">{</span>
      <span class="perl1 kwc">my</span> <span class="perl1 kwb">$comment</span> <span class="perl1 sym">=</span> <span class="perl1 kwb">$comment_stash</span><span class="perl1 sym">-&gt;{</span>column_values<span class="perl1 sym">};</span>
      <span class="perl1 slc"># mt_comment column values</span>
    <span class="perl1 sym">}</span>

    <span class="perl1 slc"># more stash blocks ...</span>
  <span class="perl1 sym">}</span>

<span class="perl1 sym">}</span></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_wm01odd_1')})/*]]&gt;*/</script>

For instance, consider a custom entry template tag usage like this:

<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_wm01odd_2">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
&lt;mt:MyTemplateTag caption=&quot;&lt;$mt:EntryTitle$&gt; authored by &lt;$mt:EntryAuthor$&gt;&quot;&gt;
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_wm01odd_2')})/*]]&gt;*/</script>

This won't work as expected, since MT doesn't substitite template tags
within template tag attributes.
So, using context stash, you can insert a simple template tag
substitution into plugin code.

<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_wm01odd_3">[-] hide code</a></legend><div class="collapsible-container"><pre class="code perl1"><span class="perl1 kwa">sub</span> MyTemplateTag <span class="perl1 sym">{</span>
  <span class="perl1 kwc">my</span> <span class="perl1 sym">(</span><span class="perl1 kwb">$ctx</span><span class="perl1 sym">,</span> <span class="perl1 kwb">$args</span><span class="perl1 sym">,</span> <span class="perl1 kwb">$cond</span><span class="perl1 sym">) =</span> <span class="perl1 kwb">&#64;_</span><span class="perl1 sym">;</span>

  <span class="perl1 slc"># some template tag attribute</span>
  <span class="perl1 kwc">my</span> <span class="perl1 kwb">$caption</span> <span class="perl1 sym">=</span> <span class="perl1 kwb">$args</span><span class="perl1 sym">-&gt;{</span><span class="perl1 str">caption</span><span class="perl1 sym">};</span>

  <span class="perl1 slc"># check for column groups within context stash</span>
  <span class="perl1 kwc">my</span> <span class="perl1 kwb">%columns</span> <span class="perl1 sym">=</span> <span class="perl1 kwc">map</span> <span class="perl1 sym">{</span>
    <span class="perl1 kwc">my</span> <span class="perl1 kwb">$s</span> <span class="perl1 sym">=</span> <span class="perl1 kwb">$ctx</span><span class="perl1 sym">-&gt;</span><span class="perl1 kwd">stash</span><span class="perl1 sym">(</span><span class="perl1 kwb">$_</span><span class="perl1 sym">);</span>
    <span class="perl1 kwb">$_</span> <span class="perl1 sym">=&gt; (</span><span class="perl1 kwc">ref</span> <span class="perl1 kwb">$s</span> ? <span class="perl1 kwb">$s</span><span class="perl1 sym">-&gt;{</span><span class="perl1 str">column_values</span><span class="perl1 sym">} :</span> <span class="perl1 kwc">undef</span><span class="perl1 sym">);</span>
  <span class="perl1 sym">}</span> <span class="perl1 kwc">qw</span><span class="perl1 str">(</span>entry comment ping<span class="perl1 sym">);</span>

  <span class="perl1 slc"># substitute &lt;$mt:FooBarName$&gt; with the column</span>
  <span class="perl1 slc"># value of bar_name in table mt_foo</span>
  <span class="perl1 kwc">local</span> <span class="perl1 sym">*</span>subst <span class="perl1 sym">=</span> <span class="perl1 kwa">sub</span> <span class="perl1 sym">{</span>
    <span class="perl1 kwc">my</span> <span class="perl1 sym">(</span><span class="perl1 kwb">$match</span><span class="perl1 sym">,</span> <span class="perl1 kwb">$name</span><span class="perl1 sym">) =</span> <span class="perl1 kwb">&#64;_</span><span class="perl1 sym">;</span>
    <span class="perl1 kwa">foreach</span> <span class="perl1 sym">(</span> <span class="perl1 kwc">keys</span> <span class="perl1 kwb">%columns</span> <span class="perl1 sym">)</span>
    <span class="perl1 sym">{</span>
      <span class="perl1 slc"># map FooBarName to column bar_name in mt_foo</span>
      <span class="perl1 kwb">$name</span> <span class="perl1 sym">=~</span> <span class="perl1 kwc">/^$_(\w+)/</span>i <span class="perl1 kwa">or next</span><span class="perl1 sym">;</span>
      <span class="perl1 kwc">my</span> <span class="perl1 kwb">$key</span> <span class="perl1 sym">=</span> <span class="perl1 kwb">$1</span><span class="perl1 sym">;</span>
      <span class="perl1 kwb">$key</span> <span class="perl1 sym">=~</span> s<span class="perl1 str">/([a-z])([A-Z])/qq($1_\l$2)/</span><span class="perl1 kwd">eg</span><span class="perl1 sym">;</span>
      <span class="perl1 kwb">$key</span> <span class="perl1 sym">=</span> lc <span class="perl1 kwb">$key</span><span class="perl1 sym">;</span>
      <span class="perl1 kwa">return</span> <span class="perl1 kwb">$columns</span><span class="perl1 sym">{</span><span class="perl1 kwb">$_</span><span class="perl1 sym">}{</span><span class="perl1 kwb">$key</span><span class="perl1 sym">}</span>
        <span class="perl1 kwa">if</span> exists <span class="perl1 kwb">$columns</span><span class="perl1 sym">{</span><span class="perl1 kwb">$_</span><span class="perl1 sym">}{</span><span class="perl1 kwb">$key</span><span class="perl1 sym">};</span>
    <span class="perl1 sym">}</span>
    <span class="perl1 kwb">$match</span><span class="perl1 sym">;</span> <span class="perl1 slc"># not recognized</span>
  <span class="perl1 sym">};</span>
  <span class="perl1 kwb">$caption</span> <span class="perl1 sym">=~</span> <span class="perl1 sym">s/</span><span class="perl1 str">(&lt;\$mt:(\w+)\s*\$&gt;)</span><span class="perl1 sym">/&amp;subst(<span class="perl1 kwb">$1</span><span class="perl1 sym">,</span><span class="perl1 kwb">$2</span>)<span class="perl1 sym">/</span></span><span class="perl1 kwd">egi</span><span class="perl1 sym">;</span>

  <span class="perl1 slc"># now do something interesting with $caption ...</span>
<span class="perl1 sym">}</span></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_wm01odd_3')})/*]]&gt;*/</script>

of course, only template tags without modifiers are recognized.

]]>
  
  </content>
</entry>

</feed>

