<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>beta BLOG dot NET - recently in .NET category</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/net/" />
  <link rel="self" type="application/atom+xml" href="" />
  <id>tag:beta-blog.net,2009-08-27://1</id>
  <updated>2010-11-01T21:12:54Z</updated>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.25</generator>

<entry>
  <title>Lokalisiert: Das kleine EsZett im World Wide Web</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2010/10/lokalisiert-das-kleine-eszett-im-world-wide-web" />
  <id>tag:beta-blog.net,2010://1.52394</id>

  <published>2010-10-31T14:36:51Z</published>
  <updated>2010-11-01T21:12:54Z</updated>

  <summary>Wie immer, wenn etwas immer größer und komplizierter wird, zeichnet sich ein Trend zur Lokalisierung ab. Das Internet ist ein topologischer Raum, der so hochdimensional geworden ist, dass man ihn nur noch als Überdeckung eines unfassbaren Etwas durch lokale Landkarten erklären kann. Das Kraftwerk der Globalisierung sehnt sich heute nach Semantik, sucht soziale Kontakte und organische Strukturen.Es möchte den Menschen nahe sein, ihre Gegend kennen und ihren Dialekt sprechen.</summary>
  <author>
    <name>Sebastian</name>
    <uri>http://beta-blog.net</uri>
  </author>
  
  <category term=".NET" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="__meta" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="domains" label="domains" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="unicode" label="unicode" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<p>
Wie immer, wenn etwas immer grö&szlig;er und komplizierter wird, zeichnet sich ein Trend zur Lokalisierung ab. Das Internet ist ein topologischer Raum, der so hochdimensional geworden ist, dass man ihn nur noch als Überdeckung eines unfassbaren Etwas durch lokale Landkarten erklären kann. Das Kraftwerk der Globalisierung sehnt sich heute nach Semantik, sucht soziale Kontakte und organische Strukturen.Es möchte den Menschen nahe sein, ihre Gegend kennen und ihren Dialekt sprechen.
</p>

<p>
Die alten globalen und normativen Lösungen werden verfeinert und den Bedürfnissen angepasst. Zum Beispiel der IDNA-Standard. Die Intention von IDNA ist es, Abbildungen zu definieren, die eine Kommunikation zwischen den unterschiedlichen und eigenartigen Informationsspektren beliebiger lokaler Entitäten auf der Basis uralter und schwer veränderbarer Protokolle wie dem DNS-System ermöglichen.
</p>

<p>
Im August 2010 veröffentlichte die IETF die IDNA2008-Spezifizierungen in RFC <a target="_blank" href="http://tools.ietf.org/html/rfc5890">5890</a> - <a target="_blank" href="http://tools.ietf.org/html/rfc5894">5894</a>. Eine Übersicht über die Unterschiede zwischen IDNA2003 und IDNA2008 bietet der <a target="_blank" href="http://unicode.org/reports/tr46/">Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)</a>. Bis dahin war IDNA eine globale Funktion, im Wesentlichen bestehend aus dem Nameprep-Mapping und dem Punycode-Algorithmus, wobei Nameprep effektiv durch eine Reihe von Tabellen definiert ist, die z.B. Gro&szlig;- auf Kleinbuchstaben und &quot;&szlig;&quot; auf &#8220;ss&#8221; abbilden. Der alte Standard hatte damit die Verantwortlichkeit für die konsistente Behandlung unterschiedlicher Sprachen innhalb des IDNA-Protokolls angesiedelt. Diese Lösung war global, einfach und unflexibel.
</p>

<p>
Mit IDNA2008 wurde eine neue Terminolgie geschaffen, die den Anforderungen an die lokale Unterschiedlichkeit von Applikationen und Benutzern gerecht werden und die Offenheit gegenüber zukünftigen Unicode-Versionen garantieren soll. Das Eszett, namentlich der Codepoint <code>U+00DF (LATIN SMALL LETTER SHARP S)</code>, wurde auf Betreiben von DENIC als Ausnahme in die Kategorie <code>PVALID</code> (Protocol Valid) aufgenommen. Während also die deutsche Ligatur <em>&szlig;</em> in der DE-Zone nun mit <code>xn--zca</code> aufgelöst wird, kann es gleichzeitig in einer anderen Zone mit <code>ss</code> aufgelöst werden. Ein Beitrag zur Lokalisierung und zur Konjunktur bei ISPs und Anwälten.
</p>

<p>
Dieser Codepoint bildet nun eine <a target="_blank" href="http://unicode.org/reports/tr46/#Deviations">Deviation</a>, d.h. unterschiedliche Applikationen können ihn abweichend verarbeiten. Man stelle sich vor, Alice greift von zu Hause aus auf ihr Konto unter <em>http://www.sparkasse-gießen.de</em> zu. Ihr Browser unterstützt IDNA2003, bildet also auf <em>http://www.sparkasse-giessen.de</em> ab und löst auf die IP-Adresse des Sparkassenservers auf. Nun besucht sie ihren Freund Bob und prüft dort ihren Kontostand. Bobs Browser unterstützt IDNA2008, benutzt also bei gleicher Eingabe stattdessen <em>http://www.xn&#8212;sparkasse-gieen-2ib.de</em>, was auf eine ganz andere IP-Adresse aufgelöst werden kann. Unter dieser könnte der Phishing-Server von Eve antworten, die so die Zugangsdaten von Alice ausspionieren kann.
</p>

<p>
Und wenn schlie&szlig;lich der Browser am Encoding der Diskussionsbeiträge verzweifelt, so wirkt das irgendwie selbstreferenziell:
</p>

<p>
<img src="http://blog.http.net/wp-content/uploads/2010/10/eszett1.jpg" alt="discussions on ietf.idnabis" />

Ãh??? :-)
</p>

<p>
Für den ISP-Developer ist Lokalisierung allerdings stets eine Herausforderung. Er ist dafür zuständig, dass die lokalen Einheiten untereinander und miteinander kommunizieren können. Und wer will sich heute schon noch in ASCII unterhalten.
</p>

<p>
Das kleine &quot;&szlig;&quot; landet also eines Tages auf dem Schreibtisch und erklärt sich für gültig. Natürlich wei&szlig; die an Hunderten von Stellen tief in alle Systeme eingegrabene IDNA-Software noch lange nichts von den neuen RFCs. Der .NET <code>System.Globalization</code>-Namespace, der bis dahin zuverlässige Dienste beim Normalisieren, Validieren und Konvertieren auch der absonderlichsten Zeichen leistete, ist unter keinen Umständen dazu zu bewegen, ein &quot;&szlig;&quot; in Domain-Namen zu akzeptieren. Und was so eine DLL nicht kann, das kann sie eben nicht - tja, auf hoher See, vor Gericht und vor Microsoft  &#8230;
</p>

<p>
Am 26. Oktober 2010 kündigte DENIC an, praktisch ab sofort IDNA2008 zu unterstützen und zunächst in einer Sunrise-Phase allen Inhabern von Domains, die ein &#8220;ss&#8221; enthalten, die Gelegenheit zu geben, das Pendant mit &quot;&szlig;&quot; zu registrieren. Es musste innherhalb von wenigen Stunden eine Ad-hoc-Lösung her. Und das einzig Naheliegende war, eine automatische Abfrage des <a target="_blank" href="http://www.denic.de/domains/internationalized-domain-names/idn-konvertierung.html">DENIC-Web-Tools</a> zu programmieren, das in dem Moment den einzigen bekannten &szlig;-fähigen Konvertierungsmechanismus bot. Unsere Auszubildende setzte das prompt um, während eilig RFCs studiert und nach einer tragfähigen Lösung gesucht wurde. Diese bietet die GNU IDN Library <a target="_blank" href="http://www.gnu.org/software/libidn/">Libidn</a>. Die kann zwar auch noch kein &quot;&szlig;&quot;, ist aber fix gepatcht. Der Code liegt in C, Java und C# vor, da ist also für jeden etwas dabei.
</p>

<p>
<img src="http://blog.http.net/wp-content/uploads/2010/10/nameprep1.jpg" alt="LibIDN source code" />
</p>

<p>
Originalartikel in unserem <a href="http://blog.http.net/allgemein/lokalisiert-das-kleine-eszett-im-world-wide-web/" target="_blank">ISP Blog</a>
</p>
]]>
  

  </content>
</entry>

<entry>
  <title>Is LINQ functional?</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2010/03/is-linq-functional" />
  <id>tag:beta-blog.net,2010://1.52390</id>

  <published>2010-03-31T20:11:19Z</published>
  <updated>2010-04-02T01:17:02Z</updated>

  <summary>With it&apos;s 3.5 extensions, the .NET framework started to turn into a really cool looking programming concept, last but not least due to the syntactic sugar of LINQ. A reason for that is surely it&apos;s functional look. Well, as LINQ is integrated into an imperative context, it won&apos;t be ever able to guarantee state-free evaluation as a genuine functional language does. Nevertheless it&apos;s worth to discuss and play around with a few aspects of it in terms of a multiple programming paradigm concept. </summary>
  <author>
    <name>Sebastian</name>
    <uri>http://beta-blog.net</uri>
  </author>
  
  <category term=".NET" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="algorithms" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="net" label=".NET" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="c" label="C#" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="math" label="math" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<p>
With it's 3.5 extensions, the .NET framework started to turn into a really
cool looking programming concept,
last but not least due to the syntactic sugar of
<a href="http://msdn.microsoft.com/en-us/library/bb397676.aspx" target="_blank">LINQ</a>.
A reason for that is surely it's <a href="http://en.wikipedia.org/wiki/Functional_programming" target="_blank">functional</a>
look.
Well, as LINQ is integrated into an imperative context, it won't be ever able to
guarantee state-free evaluation as a genuine functional language does.
Nevertheless it's worth to discuss and play around with a few aspects of it
in terms of a multiple programming paradigm concept.
</p>

<h3>Delegating definitions in C# 3.0</h3>
<p>
Firstly, the concept of
<a href="http://en.wikipedia.org/wiki/First-class_function" target="_blank">first-class functions</a>,
i.e. the invention of the function type, leads to the notion of closures.
So for instance, a constant function such as
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Func</span>&lt;<span class="kwd builtin">int</span>&gt; <span class="type">i</span> = () =&gt; <span class="num">1</span>;
</code></pre>
<p>
defines something like a readonly variable.
You may get it's value now, later or never,
but you can always be sure that it's value won't be ever changed anywhere in your code.
Hence, you have won a quantum of control over your program by this
weird piece of code.
That's a basic idea of functional programming.
</p>

<p>
The concept of function types leads to higher order
functions, i.e. functions mapping functions to other functions.
Thus, the <a href="http://en.wikipedia.org/wiki/Currying" target="_blank">curry functor</a>,
a key concept in the theory of functional programming,
is regarded:
</p>
<p class="quote">
<span class="math">curry: (X <span class="small">x</span> Y &rarr; Z) &rarr; (X  &rarr; Y  &rarr; Z)</span>
</p>
<p>
That is, for any function <span class="math">f(x,y)</span>, there is a curryied function
<span class="math">curry(f)(x)</span>
taking <span class="math">x</span> to a function <span class="math">g(y) = f(x,y)</span>.
This is now implemented easily in C# using generic types:
</p>
<pre class="code">
<code class="csharpnet"><span class="kwd builtin">static</span> <span class="kwd def">Func</span>&lt;<span class="type">X</span>, <span class="kwd def">Func</span>&lt;<span class="type">Y</span>, <span class="type">Z</span>&gt;&gt; <span class="type">Curry</span>&lt;<span class="type">X</span>, <span class="type">Y</span>, <span class="type">Z</span>&gt;(<span class="kwd def">Func</span>&lt;<span class="type">X</span>, <span class="type">Y</span>, <span class="type">Z</span>&gt; <span class="type">f</span>)
{
  <span class="kwd builtin">return</span> <span class="type">x</span> =&gt; <span class="type">y</span> =&gt; <span class="type">f</span>(<span class="type">x</span>, <span class="type">y</span>);
}
</code></pre>
<p>
(inspired by this <a target="_blank" href="http://jacobcarpenter.wordpress.com/2008/01/02/c-abuse-of-the-day-functional-library-implemented-with-lambdas/">C# abuse of the day</a>).
Well, that's more or less of academic interest, since one would hardly ever replace
<span class="code">x++</span> by
</p>
<pre class="code">
<code class="csharpnet"><span class="type">x</span> = <span class="type">Curry</span>&lt;<span class="kwd builtin">int</span>, <span class="kwd builtin">int</span>, <span class="kwd builtin">int</span>&gt;((<span class="type">a</span>, <span class="type">b</span>) =&gt; <span class="type">a</span> + <span class="type">b</span>)(<span class="num">1</span>)(<span class="type">x</span>); <span class="cmnt">// x++ ;)</span>
</code></pre>
<p>
A slightly more interesting example is the following:
</p>
<pre class="code">
<code class="csharpnet"><span class="cmnt">// using System.Text.RegularExpressions;</span>
<span class="kwd builtin">var</span> <span class="type">grep</span> = <span class="type">Curry</span>&lt;<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx" target="_blank" rel="nofollow">Regex</a>, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">string</span>&gt;, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">string</span>&gt;&gt;(
  (<span class="type">regex</span>, <span class="type">list</span>) =&gt; <span class="kwd builtin">from</span> <span class="type">s</span> <span class="kwd builtin">in</span> <span class="type">list</span>
                   <span class="kwd builtin">where</span> <span class="type">regex.Match</span>(<span class="type">s</span>).<span class="type">Success</span>
                   <span class="kwd builtin">select</span> <span class="type">s</span>);
<span class="kwd builtin">var</span> <span class="type">grepFoo</span> = <span class="type">grep</span>(<span class="kwd builtin">new</span> <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx" target="_blank" rel="nofollow">Regex</a>(<span class="str">&quot;foo&quot;</span>));
</code></pre>
<p>
Thus, <span class="code">grepFoo</span> will grep all words containing
<code class="csharpnet"><span class="str">&quot;foo&quot;</span></code>
from a wordlist.
Attention should be paid to the fact that with the statement
</p>
<pre class="code">
<code class="csharpnet"><span class="kwd builtin">var</span> <span class="type">fooList</span> = <span class="type">grepFoo</span>(<span class="kwd builtin">new</span> <span class="kwd builtin">string</span>[]{<span class="str">&quot;foo&quot;</span>, <span class="str">&quot;bar&quot;</span>, <span class="str">&quot;foobar&quot;</span>});
</code></pre>
<p>
then there is still no regex applied.
Indeed, <code class="csharpnet"><span class="type">fooList</span></code>
is of type
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">string</span>&gt;</code>
and not yet enumerated at this point.
So the evaluation of the expression is deferred until it's result is needed by another computation
- smells like lazy evaluation.
</p>

<h3>LINQ is not lazy!</h3>
<p>
One of the most important paradigms of functional programming is the concept of
<a href="http://en.wikipedia.org/wiki/Lazy_evaluation" target="_blank">lazy evaluation</a>.
For instance, in a functional language, such as the good old
<a href="http://haskell.org/" target="_blank">Haskell</a>,
an expression such as
</p>
<pre class="code"><code>length [1, 2, 3/0]
</code></pre>
<p>
evaluates to <span class="code">3</span>.
That is, the control system is too lazy to fail on division by zero,
neither at compile time nor on run time, since it doesn't need to know any element
inside the array in order to calculate it's length.
In <em>C#</em> (where you aren't even able to compile an expression such as <span class="code">1/0</span>),
you may let
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">var</span> <span class="type">q1</span> = <span class="kwd builtin">from</span> <span class="type">i</span> <span class="kwd builtin">in</span> (<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">int</span>&gt;)<span class="kwd builtin">new</span> <span class="kwd builtin">int</span>[] { <span class="num">1</span>, <span class="num">2</span>, <span class="num">3</span> }
         <span class="kwd builtin">select</span> <span class="num">1</span>/(<span class="type">i</span> - <span class="num">3</span>);
</code></pre>
<p>
without getting a run time error.
But this has nothing to do with lazy evaluation, since the query expression isn't evaluated at all at this point
(in contrast to the array definition inside the query), so the query expression is simply treated as a function definition.
However, as soon as an aggregation expression such as
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">int</span> <span class="type">three</span> = <span class="type">q1.Count</span>();
</code></pre>
<p>
is reached, a
<span class="code"><code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.dividebyzeroexception.aspx" target="_blank" rel="nofollow">DivideByZeroException</a></code></span>
will be thrown.
Thus, LINQ evaluates eager here, not lazy.
On the other hand,
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">int</span> <span class="type">two</span> = <span class="type">q1.Take</span>(<span class="num">2</span>).<span class="type">Count</span>();
</code></pre>
<p>
works fine, since the black hole stays unevaluated due to the <code>Take</code> operator.
But, having
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">var</span> <span class="type">q2</span> = <span class="kwd builtin">from</span> <span class="type">i</span> <span class="kwd builtin">in</span> (<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">int</span>&gt;)<span class="kwd builtin">new</span> <span class="kwd builtin">int</span>[] { <span class="num">1</span>, <span class="num">2</span>, <span class="num">3</span> }
         <span class="kwd builtin">select</span> <span class="num">1</span>/(<span class="type">i</span> - <span class="num">1</span>);
<span class="kwd builtin">int</span> <span class="type">two2</span> = <span class="type">q2.Skip</span>(<span class="num">1</span>).<span class="type">Count</span>();
</code></pre>
<p>
instead, you will - guess what! - catch the exception again.
Thus, in contrast to the <span class="code"><code class="csharpnet"><span class="type">Take</span></code></span> operator,
the <span class="code"><code class="csharpnet"><span class="type">Skip</span></code></span> operator
does iterate through skipped elements and hence evaluates them.
Ok, that's no surprise, since these operators are using the
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.aspx" target="_blank" rel="nofollow">IEnumerator</a></code>
provided by the corresponding
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a></code>.
So, LINQ pretends to be lazy in the way that
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">var</span> <span class="type">p</span> = <span class="type">q2.Reverse</span>();
</code></pre>
<p>
won't be evaluated at this point and thus doesn't fail, wheras
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">int</span> <span class="type">two3</span> = <span class="type">p.Take</span>(<span class="num">2</span>).<span class="type">Count</span>();
</code></pre>
<p>
then throws again the exception even though the evil one shuoldn't be taken here.
</p>
<p>
A functional approach to force lazyness would be to replace value expressions by
constant functions, but the compiler won't accept something like this:
</p>
<pre class="code"><code class="csharpnet"><span class="cmnt">// The type of the expression in the select clause is incorrect.</span>
<span class="cmnt">// Type inference failed in the call to &#039;Select&#039;.</span>
<span class="kwd builtin">var</span> <span class="type">q1_</span> = <span class="kwd builtin">from</span> <span class="type">i</span> <span class="kwd builtin">in</span> (<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">int</span>&gt;)<span class="kwd builtin">new</span> <span class="kwd builtin">int</span>[] { <span class="num">1</span>, <span class="num">2</span>, <span class="num">3</span> }
          <span class="kwd builtin">select</span> () =&gt; <span class="num">1</span> / (<span class="type">i</span> - <span class="num">3</span>);
</code></pre>
<p>
Hence, LINQ isn't lazy, but has a smart way to make function definitions
looking like statement expressions.
</p>


<h3>Diving into recursion</h3>
<p>
Remember the famous
</p>
<a href="http://en.wikipedia.org/wiki/Fibonacci_number" target="_blank">Fibonacci numbers</a>:
<p class="quote">
<span class="math">fib<sub>0</sub> = 0, fib<sub>1</sub> = 1, fib<sub>n</sub> = fib<sub>n-1</sub> + fib<sub>n-2</sub>.</span>
</p>
<p>
The sequence starts with
</p>
<p class="quote">
<span class="math">fibs = 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ...</span>
</p>
<p>
where <span class="math">fibs<sub>100</sub></span> is a number consisting of 21 digits then, so it grows quite fast.
Although one may calculate Fibonacci numbers in constant time using
<a href="http://mathworld.wolfram.com/BinetsFibonacciNumberFormula.html" target="_blank">Binet's formula</a>,
the definition leads to interesting comparisons of different recursion strategies.
</p>

<p>
Well, lets have a
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">delegate</span> <span class="kwd builtin">long</span> <span class="kwd def">Fibonacci</span>(<span class="kwd builtin">int</span> <span class="type">n</span>);
</code></pre>
<p>
A direct translation of the definition into a lambda recursion looks like this:
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Fibonacci</span> <span class="type">fib1</span> = <span class="kwd builtin">null</span>; <span class="cmnt">// pre-assigned for use within recursion</span>
<span class="type">fib1</span> = <span class="type">n</span> =&gt; <span class="type">n</span> &lt;= <span class="num">1</span> ? <span class="type">n</span> : <span class="type">fib1</span>(<span class="type">n</span> - <span class="num">1</span>) + <span class="type">fib1</span>(<span class="type">n</span> - <span class="num">2</span>);
</code></pre>
<p>
The funny thing with this implementation is, that the Fibonacci function itself determines it's run time:
It's <span class="math">O(fib<sub>n</sub>)</span>, i.e. lower values will be
recalculated many times again and again in order to get a higher one, due to the lack of an aggregating strategy.
</p>

<p>
Now, in Haskell you may get around this very elegantly by defining an infinitive list:
</p>
<pre class="code"><code>fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
</code></pre>
<p>
The list is inititialized with two elements.
Then, notional, the <code>tail</code> function shifts the first element from the <code>fibs</code> list,
while <code>zipWith (+)</code> creates a new list by adding elements of both
<code>fibs</code> and <code>(tail fibs)</code> with each other then.
But in practice, Haskell is smart and lazy enough to avoid any needless recalculation
of numbers already present in the <code>fibs</code> list.
Thus, the algorithm applied here is the same one a human being would apply spontaneously using a
pencil and a chit of paper. So, it's <span class="math">O(n)</span>.
</p>

<p>
To define an infinitive list in C#, one should
implement the
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a></code>
interface in the way that
the corresponding
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.aspx" target="_blank" rel="nofollow">IEnumerator</a></code>
expands the list on demand within it's
<code class="csharpnet"><span class="type">MoveNext</span>()</code>
method then.
Here, it's enough to have a little inliner,
taking a list and an expanding function to a
<code class="csharpnet"><span class="kwd def">Fibonacci</span></code> type:
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Func</span>&lt;
  <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;,
  <span class="kwd def">Func</span>&lt;<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;&gt;,
  <span class="kwd def">Fibonacci</span>&gt; <span class="type">infList</span> = <span class="kwd builtin">null</span>;
<span class="type">infList</span> = (<span class="type">list</span>, <span class="type">exp</span>) =&gt; <span class="type">n</span> =&gt; <span class="type">n</span> &lt; <span class="type">list.Count</span>() ?
  <span class="type">list.Skip</span>(<span class="type">n</span>).<span class="type">First</span>() : <span class="type">infList</span>(<span class="type">exp</span>(<span class="type">list</span>), <span class="type">exp</span>)(<span class="type">n</span>);
</code></pre>
<p>
Now, C# also provides a <code>Zip</code> function.
So, a simple syntactic translation of the Haskell list would look like this:
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Func</span>&lt;<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;&gt; <span class="type">fibZip</span> = <span class="type">fibs</span> =&gt;
  <span class="type">fibs.Take</span>(<span class="num">2</span>).<span class="type">Concat</span>(<span class="type">fibs.Zip</span>(<span class="type">fibs.Skip</span>(<span class="num">1</span>), (<span class="type">x</span>, <span class="type">y</span>) =&gt; <span class="type">x</span> + <span class="type">y</span>));
</code></pre>
<p>
Hm, but this one is even worse than the naive recursion.
Indeed, trying
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Fibonacci</span> <span class="type">fib2</span> = <span class="type">infList</span>(<span class="kwd builtin">new</span> <span class="kwd builtin">long</span>[] { <span class="num">0</span>, <span class="num">1</span> }, <span class="type">fibZip</span>);
</code></pre>
<p>
then, you will see that aggregation doesn't work at all this way, since the concept
of enumeration is not functional.
We may repair the <code>fibZip</code> as follows:
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Func</span>&lt;<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;&gt; <span class="type">fibZip2</span> = <span class="type">fibs</span> =&gt;
  <span class="type">fibs.Concat</span>((<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;)(<span class="kwd builtin">new</span> <span class="kwd builtin">long</span>[] {
    <span class="type">fibs.Skip</span>(<span class="type">fibs.Count</span>() - <span class="num">2</span>).<span class="type">Sum</span>() }));
</code></pre>
<p>
This one looks a bit weird, since it's not that easy to extend an
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a></code>
by one element. Anyway,
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Fibonacci</span> <span class="type">fib3</span> = <span class="type">infList</span>(<span class="kwd builtin">new</span> <span class="kwd builtin">long</span>[] { <span class="num">0</span>, <span class="num">1</span> }, <span class="type">fibZip2</span>);
</code></pre>
<p>
indeed does the job in <span class="math">O(n)</span> then,
even though the idea of an infinitive list has lost it's magic this way.
</p>

<h3>Conclusion</h3>
<p>
As expected, neither C# nor LINQ turns out to implement
the paradigms of a functional language.
None  the less, it's really fancy. 8-)
</p>
]]>
  
  </content>
</entry>

<entry>
  <title>understanding unicode surrogates / or: how to deal with Linear B strings in .NET</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2009/11/understanding-unicode-surrogates-or-how-to-deal-with-linear-b-strings-in-net" />
  <id>tag:beta-blog.net,2009://1.52383</id>

  <published>2009-11-17T20:23:58Z</published>
  <updated>2010-07-14T21:34:53Z</updated>

  <summary>Remember a String object in .NET is a collection of Char objects, where a Char object in turn s announced as a unicode character, encoded by a 16bit unsigned integer. Thus, more precisely speaking, a single Char object is able to encode any codepoint within the basic multilingual lane (BMP), i.e. between U+0000 and U+FFFF. So, where goes the rest of the story? Unicode, as an universal character set, is designed to support much more than 65536 characters of ourse.
</summary>
  <author>
    <name>Sebastian</name>
    <uri>http://beta-blog.net</uri>
  </author>
  
  <category term=".NET" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="codes" label="codes" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="math" label="math" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="unicode" label="unicode" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<p>
Remember a <span class="code cs1"><span class="ob">String</span></span> object
in .NET is a collection of <span class="code cs1"><span class="ob">Char</span></span>
objects, where a <span class="code cs1"><span class="ob">Char</span></span> object
in turn is announced as a
<a href="http://en.wikipedia.org/wiki/Unicode" target="_blank">unicode character</a>,
encoded by a 16bit unsigned integer.
Thus, more precisely speaking, a single <span class="code cs1"><span class="ob">Char</span></span>
object is able to encode any codepoint within the
<a href="http://en.wikipedia.org/wiki/Mapping_of_Unicode_character_planes#Basic_Multilingual_Plane" target="_blank">basic multilingual plane (BMP)</a>,
i.e. between <span class="code">U+0000</span> and <span class="code">U+FFFF</span>.
So, where goes the rest of the story? Unicode, as an universal character set,
is designed to support much more than 65536 characters of course.
</p>
<p>
Now, the trick is to encode code points above <span class="code">2<sup>16</sup></span>
by so-called surrogates, that is, by pairs of 16bit integers.
To see how this works, remember the well-known
<a href="http://en.wikipedia.org/wiki/Division_algorithm" target="_blank">division algorithm</a>
for integers. That is, if you have an upper bound <span class="math">M</span> and
fix an integer constant <span class="math">C (0 &lt; C &lt; M)</span>,
for any integer <span class="math">N</span> within the range of
<span class="math">0 &le; N &lt; 2<sup>M</sup></span>,
there exists a unique pair of integers <span class="math">H,L</span>, such that
</p>
<p class="quote">
<span class="math">N = 2<sup>C</sup> * H + L,</span> where <span class="math">0 &le; L &lt; 2<sup>C</sup></span> and <span class="math">0 &le; H &lt; 2<sup>M - C</sup></span>.
</p>
<p>
That way you have simply encoded these <span class="math">2<sup>M</sup></span> numbers
<span class="math">N</span> by <span class="math">2<sup>C</sup> * 2<sup>M - C</sup></span> pairs
of numbers <span class="math">H,L</span>.
Hence <span class="math">2<sup>M</sup></span> large numbers are adressed using a set of
<span class="math">2<sup>C</sup> + 2<sup>M-C</sup></span> small numbers, that's the trick.
</p>

<p>
As we are interested in encoding integers above <span class="math">2<sup>16</sup></span>
by pairs of 16bit integers, we should act on the assumption
</p>
<p class="quote">
<span class="math">2<sup>16</sup> &le; N' &lt; 2<sup>16</sup> + 2<sup>M</sup></span>,
</p>
<p>
dealing with <span class="code">N = N' - 2<sup>16</sup></span> then.
In order to decide whether any 16bit number does belong to a surrogate pair,
playing either the role of <span class="code">H</span> or <span class="code">L</span>,
finally fix an adequate constant <span class="code">T</span> and set
</p>
<p class="quote">
<span class="math">H' = H + T, L' = L + T + 2<sup>C</sup>,</span>
</p>
<p>
thus having tagged all 16bit integers <span class="math">I</span> achieving
<span class="math">T &le; I &lt; T + 2<sup>C</sup> + 2<sup>M-C</sup></span>
as surrogate integers, where the high surrogates of type <span class="math">H'</span>
are less than <span class="math">T + 2<sup>C</sup></span> and
the ones above are the low surrogates of type <span class="math">L'</span>.
</p>

<p>
Now, the setting of unicode is this: <span class="math">C = 10, M = 20, T = 0xD800</span>.
So, by reserving 2048 small integers as
surrogates, more than a million of additional codepoints up to
<span class="code">U+10FFFF</span> are accessible. The resulting formulars may be found here:
<a href="http://www.unicode.org/book/ch03.pdf" target="_blank">http://www.unicode.org/book/ch03.pdf</a>.
</p>

<p>
Thankfully .NET unicoders don't need to deal with hex numbers at all, because it's
ready made.
For instance, consider the name of
<a href="http://en.wikipedia.org/wiki/Amnisos" target="_blank">Amnissos</a>:
written in <a href="http://en.wikipedia.org/wiki/Linear_B" target="_blank">Linear B</a>:
</p>
<p class="quote">
<img src="http://beta-blog.net/2009/11/18/linearb_u10000.gif" alt="U+10000" /><img src="http://beta-blog.net/2009/11/18/linearb_u10016.gif" alt="U+10016" /><img src="http://beta-blog.net/2009/11/18/linearb_u1001B.gif" alt="U+1001B" /><img src="http://beta-blog.net/2009/11/18/linearb_u10030.gif" alt="U+10030" /></p>
<p>
In C# it looks like this:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_xlcrnc5u_1">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="csharpnet"><span class="cmnt">// alternatively the Char.ConvertFromUtf32() method may be used</span>
<span class="kwd builtin">string</span> <span class="type">amnisos</span> = <span class="str">&quot;\U00010000&quot;</span> + <span class="str">&quot;\U00010016&quot;</span> + <span class="str">&quot;\U0001001B&quot;</span> + <span class="str">&quot;\U00010030&quot;</span>;</code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_xlcrnc5u_1')})/*]]&gt;*/</script>
<p>
Note that indeed the <span class="code cs1"><span class="sym">Length</span></span> property
of the resulting string has a value of 8, while it contains only 4 unicode characters.
So the appropriate way of accessing the actual codepoints of an arbitrary string
should make use of
<span class="code">System.Globalization.TextElementEnumerator</span>
rather than simply access
<span class="code cs1"><span class="ob">Char</span></span> objects greenly.
It goes like this:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_xlcrnc5u_2">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="csharpnet"><span class="cmnt">// using System.Globalization;</span>
<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.globalization.textelementenumerator.aspx" target="_blank" rel="nofollow">TextElementEnumerator</a> <span class="type">en</span> = <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx" target="_blank" rel="nofollow">StringInfo</a>.<span class="type">GetTextElementEnumerator</span>(<span class="type">amnisos</span>);
<span class="kwd builtin">while</span> (<span class="type">en.MoveNext</span>())
{
  <span class="kwd builtin">string</span> <span class="type">current</span> = <span class="type">en.GetTextElement</span>();
  <span class="kwd builtin">if</span> (<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.char.aspx" target="_blank" rel="nofollow">Char</a>.<span class="type">IsSurrogate</span>(<span class="type">current</span>, <span class="num">0</span>))
  {
    <span class="cmnt">// a surrogate pair encoding one character, i.e. current.Length == 2</span>
    <span class="kwd builtin">int</span> <span class="type">codepoint</span> = <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.char.aspx" target="_blank" rel="nofollow">Char</a>.<span class="type">ConvertToUtf32</span>(<span class="type">current</span>[<span class="num">0</span>], <span class="type">current</span>[<span class="num">1</span>]);
    <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.console.aspx" target="_blank" rel="nofollow">Console</a>.<span class="type">WriteLine</span>(<span class="str">&quot;U+{0:X6}&quot;</span>, <span class="type">codepoint</span>);
  }
  <span class="kwd builtin">else</span>
  {
    <span class="cmnt">// characters within BMP:</span>
    <span class="cmnt">// current.Length &gt; 1 may be true in case of combining characters </span>
    <span class="cmnt">// cf. StringInfo.ParseCombiningCharacters()</span>
    <span class="kwd builtin">foreach</span> (<span class="kwd builtin">char</span> <span class="type">c</span> <span class="kwd builtin">in</span> <span class="type">current</span>)
    {
      <span class="kwd builtin">int</span> <span class="type">codepoint</span> = (<span class="kwd builtin">int</span>)<span class="type">current</span>[<span class="num">0</span>]; <span class="cmnt">// use AscW() in VB.NET</span>
      <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.console.aspx" target="_blank" rel="nofollow">Console</a>.<span class="type">WriteLine</span>(<span class="str">&quot;U+{0:X4}&quot;</span>, <span class="type">codepoint</span>);
    }
  }
}</code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_xlcrnc5u_2')})/*]]&gt;*/</script>
<p>
Now, when we will be able to register Linear B domain names at last? ;)
</p>
]]>
  
  </content>
</entry>

</feed>

