Oh yeah, CallStream is great to express monads
It has been pointed out by several commenters that CallStream was a reinvention of monads. Not quite so, but the nuance is subtle.
CallStream is a pattern to express chain-callable APIs. That makes it possible to express monads with CallStream, but in the same way that JavaScript functions and C# delegates do.
Expressing monads with CallStreams can be done in a way that is quite expressive, but let me switch from the usual C# to JavaScript for that (I can’t think of a strongly-typed C# expression of the same thing, but feel free to prove me wrong in the comments).
Here’s the code for the monad:
function identity(value) {
var bind = function (operation) {
return operation(value);
}
bind.value = value;
return bind;
}
And here’s how you would use it:
var a = identity("some string")
(function(s) {
return identity(s + " was processed by identity.");
})
(function(s) {
return identity(s + " And it's chainable...");
})
.value;
This is the strict identity monad. The unit or boxing function is the identity function, the unboxing is done by getting the value property, and binding is done by using the monadic instance (the result of calling identity) as a function that takes the operation as its argument.
We can relax the monad definition like we did with the C# identity implementation from my first monad post by calling identity from the monad’s code instead of leaving that responsibility to the operation code:
function identity(value) {
var bind = function (operation) {
return identity(operation(value));
}
bind.value = value;
return bind;
}
The use of the monad is then simplified:
var a = identity("some string")
(function(s) { return s + " was processed by identity."; })
(function(s) { return s + " And it's chainable..."; })
.value;
We can also express the stateful monad:
function stateful(value, state) {
var bind = function (operation) {
return stateful(operation.call(bind, value), bind.state);
}
bind.value = value;
bind.state = state;
return bind;
}
And use it as follows:
var p = stateful(3, "This is the state");
(function (i) { return ++i; })
(function (i) { this.state += " altered"; return i * 10; })
(function (i) { return i + 2; });
alert("Value: " + p.value + "; state: " + p.state);
Note how the operation is getting called through JavaScript Function’s call method, which enables us to set the meaning of “this” from within the function body. That’s what enables one of the operations to manipulate the state through this.state.
Now if only JavaScript had a Lambda syntax so that we could stop writing function(i) {…return…}…
The original CallStream post:
http://dbj.org/dbj/?p=514
How I understood monads, part 2/2: have we met before?
The first post in this series can be found here:
http://weblogs.asp.net/bleroy/archive/2010/06/16/how-i-understood-monads-part-1-2-sleepless-and-self-loathing-in-seattle.aspx
Last time, I tried to explain how beer and Lou helped me finally understand monads. I gave a couple of trivial examples. Hopefully now that you have this new functional hammer, everything will start looking like a nail.
So let’s have a look at a few screws this time.
Like last time, I’ll be pretty liberal with the definition of a monad as long as the spirit of monads seems to be preserved. Feel free to nitpick in the comments ;)
Oh yeah, jQuery is a monadIn jQuery, you first create a wrapped set of HTML elements and then execute operations on it, the result of which is a new wrapped set:
$(".foo")
.filter(":odd")
.map(function () {
return this.id;
});
Seems familiar? That because it is: the wrapped set is a monad. Pretty much.
The unit function of the monad, which is responsible for boxing an object into the monadic type is the $ function: $(elt) is building the wrapped set for the elt element.
The binding operation is more implicit than it’s been in previous C# examples and consists in just executing one of the built-in set operations or one of the plug-ins: the operation is specified by simply calling it. So for the monad to really take arbitrary operations there is a registration phase here (the registration of the plug-in) but what the operations themselves are doing is in the spirit of monads: they act on the underlying DOM or JavaScript types and return a new wrapped set.
Unboxing the wrapped elements is done by calling the get function.
Oh yeah, FluentPath is a couple of monadsAh, here’s one I wrote before I even understood what a monad is (assuming that I do now). FluentPath is wrapping System.IO’s string paths and arrays of string paths into Path and PathCollection boxes. Once this is done, you can execute one of the built-in operations, the result of which is also a wrapped path or path collection. You can also execute custom operations:
var dirs = Path.Get("video")
.GetFiles("*.avi", true)
.Copy(p => p.Parent.Combine("backup").Combine(p.FileName))
.ChangeExtension(p => p.Extension + ".bak")
.Map(p => p.Up(2));
This is pretty close to jQuery (for a reason, FluentPath was inspired by jQuery).
Path.Get is wrapping the path, the operations are method calls, with or without Lambda parameters.
An example of an arbitrary or custom operation on the set can be seen in the example above: the Map call is taking each element of the path collection and going two levels up on it. The result is a new wrapped set of paths.
Unboxing is done by calling ToString on a path.
Oh yeah, Linq providers are monadsJust look at this:
var l = new[] {"foo", "bar", "baz"}
.Select(s => "... " + s)
.Select(s => s + " bleroy was here.");
Looks familiar? Again, it should be. We could just as well write this same example this way:
var l = from s in
from s in new[] {"foo", "bar", "baz"}
select "... " + s
select s + " bleroy was here.";
See? The IEnumerable<T> interface is the box and the selects are binding the operations.
What’s interesting here is that we have a way to specify operations on an enumeration of things that is independent from the kind of thing we’re working on: an array like above, or a SQL Server table, an XML document or whatever has a Linq provider.
It’s also independent of how the operations get executed: a Linq provider abstracts away how the expression of the operation gets translated into actual operations that the underlying type can understand.
This second abstraction is where the real magic in Linq resides: when you write a where clause in Linq, the actual implementation of that clause may be SQL or a C# operation or who knows what, but how you write the expression itself doesn’t change.
And while we’re talking about expressions…
Oh yeah, Expression<T> is a monadAn expression in .NET is a representation of a function:
Expression<Func<string, string>> expr =
s => s + " appended some stuff.";
In this example, we are taking a Lambda and wrapping it into an expression object. The beauty of it is that this object can then be manipulated at runtime to build new expressions, which can then be unwrapped into functions and executed.
By now you should have seen where I’m going with this: boxing a Lambda is done by wrapping it into an Expression and unboxing is done by calling Compile() on the expression.
The Bind operation is less obvious and would not fit in this post but it can be done by implementing an expression tree visitor.
Oh yeah, Parallel.ForEach is a monadThe new parallel extensions in .NET 4 are quite interesting as they offer parallel computation under a familiar model.
For example, you can do this:
Parallel.ForEach(
Directory.GetFiles("*.jpg"),
path => {
var bitmap = new Bitmap(path);
bitmap.RotateFlip(RotateFlipType.RotateNoneFlipXY);
bitmap.Save(path + ".flipped");
});
This code will execute a rotation operation on each JPEG image in the current directory, in parallel as much as possible, which is ideal for long-running and independent operations such as image manipulation.
What really happens is that the enumeration of files returned by Directory.GetFiles is getting wrapped into a ParallelEnumerable, and then the operation is getting executed on each of the elements of the enumeration in parallel.
The boxing operation is thus represented here by the AsParallel method (which is implicitly called by Parallel.ForEach). The binding operation of the monad is Parallel.ForEach.
What we are seeing here is a monad whose only goal is to provide a different execution model for an existing operation. The value of the monad pattern is quite clear in this example: a familiar programming model is being quite naturally extended.
Oh yeah, C#4’s dynamic is a monadThanks to the new dynamic keyword in C#4, it is now possible to ask the C# compiler to relax compile-time type checking and instead resolve the members of some objects at runtime.
Here’s a somewhat trivial example:
dynamic foo = new {
bar = "baz"
};
Console.WriteLine(foo.bar);
This example is rather pointless but what’s important is that what can be wrapped into a dynamic object comes in many shapes: .NET objects can be accessed this way (through reflection), but you can also wrap COM objects or pretty much anything provided that somebody wrote the code to resolve the underlying type system at runtime.
Boxing is done by assigning the object from the underlying type system to a dynamic variable like above. You can then perform any C# operations on the wrapped object and access the members of the underlying type system exactly as if it were a native .NET object.
The monad binding is implicit here. What really happens is that the operations are being delegated at runtime to the underlying type system by the dynamic object provider.
Monads everywhereWith functional programming becoming more mainstream and insidiously getting into more “classical” languages such as C# or JavaScript, it’s probably not such a big surprise that its patterns would find their way into our everyday APIs. Once you know the monad pattern, all sorts of things that you weren’t noticing before suddenly seem to jump into focus, like one of those optical illusions that at first you don’t see, and that once you see cannot be unseen.
Did you spot monads in familiar places? Let me know in comments.
Next time, I’ll look at expressing monads with CallStream, and after that I’ll make a post to answer some of the comments I got on this post and the previous one, such as “why should I care?”
How I understood monads, part 1/2: sleepless and self-loathing in Seattle
For some time now, I had been noticing some interest for monads, mostly in the form of unintelligible (to me) blog posts and comments saying “oh, yeah, that’s a monad” about random stuff as if it were absolutely obvious and if I didn’t know what they were talking about, I was probably an uneducated idiot, ignorant about the simplest and most fundamental concepts of functional programming. Fair enough, I am pretty much exactly that.
Being the kind of guy who can spend eight years in college just to understand a few interesting concepts about the universe, I had to check it out and try to understand monads so that I too can say “oh, yeah, that’s a monad”.
Man, was I hit hard in the face with the limitations of my own abstract thinking abilities. All the articles I could find about the subject seemed to be vaguely understandable at first but very quickly overloaded the very few concept slots I have available in my brain. They also seemed to be consistently using arcane notation that I was entirely unfamiliar with.
It finally all clicked together one Friday afternoon during the team’s beer symposium when Louis was patient enough to break it down for me in a language I could understand (C#). I don’t know if being intoxicated helped. Feel free to read this with or without a drink in hand.
So here it is in a nutshell: a monad allows you to manipulate stuff in interesting ways. Oh, OK, you might say. Yeah. Exactly.
Let’s start with a trivial case:
public static class Trivial {
public static TResult Execute<T, TResult>(
this T argument,
Func<T, TResult> operation) {
return operation(argument);
}
}
This is not a monad. I removed most concepts here to start with something very simple. There is only one concept here: the idea of executing an operation on an object. This is of course trivial and it would actually be simpler to just apply that operation directly on the object. But please bear with me, this is our first baby step. Here’s how you use that thing:
"some string" .Execute(s => s + " processed by trivial proto-monad.") .Execute(s => s + " And it's chainable!");
What we’re doing here is analogous to having an assembly chain in a factory: you can feed it raw material (the string here) and a number of machines that each implement a step in the manufacturing process and you can start building stuff. The Trivial class here represents the empty assembly chain, the conveyor belt if you will, but it doesn’t care what kind of raw material gets in, what gets out or what each machine is doing. It is pure process.
A real monad will need a couple of additional concepts. Let’s say the conveyor belt needs the material to be processed to be contained in standardized boxes, just so that it can safely and efficiently be transported from machine to machine or so that tracking information can be attached to it.
Each machine knows how to treat raw material or partly processed material, but it doesn’t know how to treat the boxes so the conveyor belt will have to extract the material from the box before feeding it into each machine, and it will have to box it back afterwards.
This conveyor belt with boxes is essentially what a monad is. It has one method to box stuff, one to extract stuff from its box and one to feed stuff into a machine.
So let’s reformulate the previous example but this time with the boxes, which will do nothing for the moment except containing stuff.
public class Identity<T> {
public Identity(T value) {
Value = value;
}
public T Value { get; private set;}
public static Identity<T> Unit(T value) {
return new Identity<T>(value);
}
public static Identity<U> Bind<U>(
Identity<T> argument,
Func<T, Identity<U>> operation) {
return operation(argument.Value);
}
}
Now this is a true to the definition Monad, including the weird naming of the methods. It is the simplest monad, called the identity monad and of course it does nothing useful. Here’s how you use it:
Identity<string>.Bind(
Identity<string>.Unit("some string"),
s => Identity<string>.Unit(
s + " was processed by identity monad.")).Value
That of course is seriously ugly. Note that the operation is responsible for re-boxing its result. That is a part of strict monads that I don’t quite get and I’ll take the liberty to lift that strange constraint in the next examples.
To make this more readable and easier to use, let’s build a few extension methods:
public static class IdentityExtensions {
public static Identity<T> ToIdentity<T>(this T value) {
return new Identity<T>(value);
}
public static Identity<U> Bind<T, U>(
this Identity<T> argument,
Func<T, U> operation) {
return operation(argument.Value).ToIdentity();
}
}
With those, we can rewrite our code as follows:
"some string".ToIdentity()
.Bind(s => s + " was processed by monad extensions.")
.Bind(s => s + " And it's chainable...")
.Value;
This is considerably simpler but still retains the qualities of a monad. But it is still pointless.
Let’s look at a more useful example, the state monad, which is basically a monad where the boxes have a label. It’s useful to perform operations on arbitrary objects that have been enriched with an attached state object.
public class Stateful<TValue, TState> {
public Stateful(TValue value, TState state) {
Value = value;
State = state;
}
public TValue Value { get; private set; }
public TState State { get; set; }
}
public static class StateExtensions {
public static Stateful<TValue, TState>
ToStateful<TValue, TState>(
this TValue value,
TState state) {
return new Stateful<TValue, TState>(value, state);
}
public static Stateful<TResult, TState>
Execute<TValue, TState, TResult>(
this Stateful<TValue, TState> argument,
Func<TValue, TResult> operation) {
return operation(argument.Value)
.ToStateful(argument.State);
}
}
You can get a stateful version of any object by calling the ToStateful extension method, passing the state object in. You can then execute ordinary operations on the values while retaining the state:
var statefulInt = 3.ToStateful("This is the state");
var processedStatefulInt = statefulInt
.Execute(i => ++i)
.Execute(i => i * 10)
.Execute(i => i + 2);
Console.WriteLine("Value: {0}; state: {1}",
processedStatefulInt.Value, processedStatefulInt.State);
This monad differs from the identity by enriching the boxes. There is another way to give value to the monad, which is to enrich the processing. An example of that is the writer monad, which can be typically used to log the operations that are being performed by the monad. Of course, the richest monads enrich both the boxes and the processing.
That’s all for today. I hope with this you won’t have to go through the same process that I did to understand monads and that you haven’t gone into concept overload like I did.
Next time, we’ll examine some examples that you already know but we will shine the monadic light, hopefully illuminating them in a whole new way. Realizing that this pattern is actually in many places but mostly unnoticed is what will enable the truly casual “oh, yes, that’s a monad” comments.
Part 2/2 of this series can be found here:
http://weblogs.asp.net/bleroy/archive/2010/06/29/how-i-understood-monads-part-2-2-have-we-met-before.aspx
Here’s the code for this article:
http://weblogs.asp.net/blogs/bleroy/Samples/Monads.zip
The Wikipedia article on monads:
http://en.wikipedia.org/wiki/Monads_in_functional_programming
This article was invaluable for me in understanding how to express the canonical monads in C# (interesting Linq stuff in there):
http://blogs.msdn.com/b/wesdyer/archive/2008/01/11/the-marvels-of-monads.aspx
How Orchard works
I just finished writing a long documentation topic on the Orchard project wiki that aims at being a good starting point for developers who want to understand the architecture, structure and general philosophy behind the Orchard CMS.
It is not required reading for anyone who only wants to write Orchard modules and themes but hopefully it will help people who want to evaluate the platform and start writing patches.
Read it here:
http://orchardproject.net/docs/How-Orchard-works.ashx
When failure is a feature
Warning: this post is going to be slightly off-topic and non-technical. Well, not computer science technical at least.
I was reading an article in SciAm this morning about the possibility of a robot uprising. Don’t laugh yet, this is a very real, if still quite remote possibility.
The main idea that was described was that AI could rise one day to self-awareness and to an ability to improve itself through self-replication beyond human abilities to control it.
Sure, that’s one possibility, and some people are actually arguing that if that’s the case, maybe it’s just the march of evolution and humankind is just destined to one day become obsolete and be replaced by something fitter, whether from good old evolution or by artificially creating its own replacement.
I would tend to agree but I do have an objection. There is a distinction in this kind of speculation that is not often pointed out: self-replication and evolution are not the same thing.
We don’t know exactly how a self-replicating pre-biotic device emerged out of inanimate matter a few billion years ago, but that it did and that it managed to evolve to the variety of life that we observe today was only possible because of one crucial little feature of the whole system: failure.
Because they arose naturally, the first pre-biotic replicators probably were clunky little things that were only working within specific conditions and that were failing often. In particular, such an imperfect replication mechanism could fail in lots of different ways under the influence of a varying environment.
And this is precisely what enabled them to mutate into something a lot more interesting. That ability to create imperfect copies of themselves is what made it possible for early organisms to adapt to an environment that is sure to change (from external causes as well as under the influence of the organisms themselves on their own environment).
Guess what? This “strategy” has been so successful that it is still a feature of all living organisms today. You’d think that organisms could have evolved to remove the imperfections and to ensure perfectly faithful self-replication. But that in the long run would be a losing strategy because the next time the environment changes in a way that is no longer compatible with your perfection, you die and leave no successor.
In fact, modern organisms did evolve such perfecting mechanisms that suppress mutations from expressing themselves, but the amazing thing is that this suppression can fail under stress, which might be part of what you observe at times of mass extinctions, which are also fertile events in that they trigger the appearance of a large number of new species that are fit to the new conditions in a geologically short amount of time.
Sex on the other hand is another evolved way of introducing variation into the genes of a population. In other words, sex is the greatest invention of all times after failure.
What I’m getting at is that we may be able in the not so remote future to create self-replicating machines and when we do, we might make the mistake of making the replication perfect. It may actually be a lot simpler to do so: the easiest to build self-replicating structure probably is a lot simpler than the messy stuff nature came up with through random processes and selection.
And if the replication is perfect, this leaves no room for anything new, ever. What it could allow on the other hand is a reproduction mechanism that is much more efficient than anything life has been able to invent yet, the sort of manufactured efficiency that could take over the planet.
Self-awareness is clearly not necessary for self-replication and might even impair it so why they are packed together so often is kind of a mystery to me.
So why am I talking about this today? To make the larger point that failure can be an essential creative force and that without it the world would be a cold and sterile place where creativity has no role to play.
I often tell my 6-year-old that if she never fails, she will never learn. Failed ideas for example are at the heart of the concept of brainstorming: failure is the path to success.
Writing the tests for FluentPath
Writing the tests for FluentPath is a challenge. The library is a wrapper around a legacy API (System.IO) that wasn’t designed to be easily testable.
If it were more testable, the sensible testing methodology would be to tell System.IO to act against a mock file system, which would enable me to verify that my code is doing the expected file system operations without having to manipulate the actual, physical file system: what we are testing here is FluentPath, not System.IO.
Unfortunately, that is not an option as nothing in System.IO enables us to plug a mock file system in. As a consequence, we are left with few options. A few people have suggested me to abstract my calls to System.IO away so that I could tell FluentPath – not System.IO – to use a mock instead of the real thing.
That in turn is getting a little silly: FluentPath already is a thin abstraction around System.IO, so layering another abstraction between them would double the test surface while bringing little or no value. I would have to test that new abstraction layer, and that would bring us back to square one.
Unless I’m missing something, the only option I have here is to bite the bullet and test against the real file system. Of course, the tests that do that can hardly be called unit tests. They are more integration tests as they don’t only test bits of my code. They really test the successful integration of my code with the underlying System.IO.
In order to write such tests, the techniques of BDD work particularly well as they enable you to express scenarios in natural language, from which test code is generated. Integration tests are being better expressed as scenarios orchestrating a few basic behaviors, so this is a nice fit.
The Orchard team has been successfully using SpecFlow for integration tests for a while and I thought it was pretty cool so that’s what I decided to use.
Consider for example the following scenario:
Scenario: Change extension
Given a clean test directory
When I change the extension of bar\notes.txt to foo
Then bar\notes.txt should not exist
And bar\notes.foo should exist
This is human readable and tells you everything you need to know about what you’re testing, but it is also executable code.
What happens when SpecFlow compiles this scenario is that it executes a bunch of regular expressions that identify the known Given (set-up phases), When (actions) and Then (result assertions) to identify the code to run, which is then translated into calls into the appropriate methods. Nothing magical. Here is the code generated by SpecFlow:
[NUnit.Framework.TestAttribute()]
[NUnit.Framework.DescriptionAttribute("Change extension")]
public virtual void ChangeExtension() {
TechTalk.SpecFlow.ScenarioInfo scenarioInfo =
new TechTalk.SpecFlow.ScenarioInfo("Change extension",
((string[])(null)));
#line 6
this.ScenarioSetup(scenarioInfo);
#line 7
testRunner.Given("a clean test directory");
#line 8
testRunner.When("I change the extension of " +
"bar\\notes.txt to foo");
#line 9
testRunner.Then("bar\\notes.txt should not exist");
#line 10
testRunner.And("bar\\notes.foo should exist");
#line hidden
testRunner.CollectScenarioErrors();
}
The #line directives are there to give clues to the debugger, because yes, you can put breakpoints into a scenario:
The way you usually write tests with SpecFlow is that you write the scenario first, let it fail, then write the translation of your Given, When and Then into code if they don’t already exist, which results in running but failing tests, and then you write the code to make your tests pass (you implement the scenario).
In the case of FluentPath, I built a simple Given method that builds a simple file hierarchy in a temporary directory that all scenarios are going to work with:
[Given("a clean test directory")]
public void GivenACleanDirectory() {
_path = new Path(SystemIO.Path.GetTempPath())
.CreateSubDirectory("FluentPathSpecs")
.MakeCurrent();
_path.GetFileSystemEntries()
.Delete(true);
_path.CreateFile("foo.txt",
"This is a text file named foo.");
var bar = _path.CreateSubDirectory("bar");
bar.CreateFile("baz.txt", "bar baz")
.SetLastWriteTime(DateTime.Now.AddSeconds(-2));
bar.CreateFile("notes.txt",
"This is a text file containing notes.");
var barbar = bar.CreateSubDirectory("bar");
barbar.CreateFile("deep.txt", "Deep thoughts");
var sub = _path.CreateSubDirectory("sub");
sub.CreateSubDirectory("subsub");
sub.CreateFile("baz.txt", "sub baz")
.SetLastWriteTime(DateTime.Now);
sub.CreateFile("binary.bin",
new byte[] {0x00, 0x01, 0x02, 0x03,
0x04, 0x05, 0xFF});
}
Then, to implement the scenario that you can read above, I had to write the following When:
[When("I change the extension of (.*) to (.*)")]
public void WhenIChangeTheExtension(
string path, string newExtension) {
var oldPath = Path.Current.Combine(path.Split('\\'));
oldPath.Move(p => p.ChangeExtension(newExtension));
}
As you can see, the When attribute is specifying the regular expression that will enable the SpecFlow engine to recognize what When method to call and also how to map its parameters. For our scenario, “bar\notes.txt” will get mapped to the path parameter, and “foo” to the newExtension parameter.
And of course, the code that verifies the assumptions of the scenario:
[Then("(.*) should exist")]
public void ThenEntryShouldExist(string path) {
Assert.IsTrue(_path.Combine(path.Split('\\')).Exists);
}
[Then("(.*) should not exist")]
public void ThenEntryShouldNotExist(string path) {
Assert.IsFalse(_path.Combine(path.Split('\\')).Exists);
}
These steps should be written with reusability in mind. They are building blocks for your scenarios, not implementation of a specific scenario. Think small and fine-grained. In the case of the above steps, I could reuse each of those steps in other scenarios.
Those tests are easy to write and easier to read, which means that they also constitute a form of documentation.
Oh, and SpecFlow is just one way to do this. Rob wrote a long time ago about this sort of thing (but using a different framework) and I highly recommend this post if I somehow managed to pique your interest:
http://blog.wekeroad.com/blog/make-bdd-your-bff-2/
And this screencast (Rob always makes excellent screencasts):
http://blog.wekeroad.com/mvc-storefront/kona-3/
(click the “Download it here” link)
Finally, Rob (him again) tells me he did a free TekPub screencast on Specflow:
http://tekpub.com/view/concepts/5
JavaScript local alias pattern
Here’s a little pattern that is fairly common from JavaScript developers but that is not very well known from C# developers or people doing only occasional JavaScript development.
In C#, you can use a “using” directive to create aliases of namespaces or bring them to the global scope:
namespace Fluent.IO {
using System;
using System.Collections;
using SystemIO = System.IO;
In JavaScript, the only scoping construct there is is the function, but it can also be used as a local aliasing device, just like the above using directive:
(function($, dv) {
$("#foo").doSomething();
var a = new dv("#bar");
})(jQuery, Sys.UI.DataView);
This piece of code is making the jQuery object accessible using the $ alias throughout the code that lives inside of the function, without polluting the global scope with another variable.
The benefit is even bigger for the dv alias which stands here for Sys.UI.DataView: think of the reduction in file size if you use that one a lot or about how much less you’ll have to type…
I’ve taken the habit of putting almost all of my code, even page-specific code, inside one of those closures, not just because it keeps the global scope clean but mostly because of that handy aliasing capability.
The fastest way to resize images from ASP.NET. And it’s (more) supported-ish.
I’ve shown before how to resize images using GDI, which is fairly common but is explicitly unsupported because we know of very real problems that this can cause. Still, many sites still use that method because those problems are fairly rare, and because most people assume it’s the only way to get the job done. Plus, it works in medium trust.
More recently, I’ve shown how you can use WPF APIs to do the same thing and get JPEG thumbnails, only 2.5 times faster than GDI (even now that GDI really ultimately uses WIC to read and write images). The boost in performance is great, but it comes at a cost, that you may or may not care about: it won’t work in medium trust. It’s also just as unsupported as the GDI option.
What I want to show today is how to use the Windows Imaging Components from ASP.NET APIs directly, without going through WPF.
The approach has the great advantage that it’s been tested and proven to scale very well. The WIC team tells me you should be able to call support and get answers if you hit problems.
Caveats exist though.
First, this is using interop, so until a signed wrapper sits in the GAC, it will require full trust.
Second, the APIs have a very strong smell of native code and are definitely not .NET-friendly.
And finally, the most serious problem is that older versions of Windows don’t offer MTA support for image decoding. MTA support is only available on Windows 7, Vista and Windows Server 2008. But on 2003 and XP, you’ll only get STA support. that means that the thread safety that we so badly need for server applications is not guaranteed on those operating systems. To make it work, you’d have to spin specialized threads yourself and manage the lifetime of your objects, which is outside the scope of this article.
We’ll assume that we’re fine with al this and that we’re running on 7 or 2008 under full trust.
Be warned that the code that follows is not simple or very readable. This is definitely not the easiest way to resize an image in .NET.
Wrapping native APIs such as WIC in a managed wrapper is never easy, but fortunately we won’t have to: the WIC team already did it for us and released the results under MS-PL. The InteropServices folder, which contains the wrappers we need, is in the WicCop project but I’ve also included it in the sample that you can download from the link at the end of the article.
In order to produce a thumbnail, we first have to obtain a decoding frame object that WIC can use. Like with WPF, that object will contain the command to decode a frame from the source image but won’t do the actual decoding until necessary.
Getting the frame is done by reading the image bytes through a special WIC stream that you can obtain from a factory object that we’re going to reuse for lots of other tasks:
var photo = File.ReadAllBytes(photoPath); var factory =
(IWICComponentFactory)new WICImagingFactory(); var inputStream = factory.CreateStream(); inputStream.InitializeFromMemory(photo,
(uint)photo.Length); var decoder = factory.CreateDecoderFromStream(
inputStream, null,
WICDecodeOptions.WICDecodeMetadataCacheOnLoad); var frame = decoder.GetFrame(0);
We can read the dimensions of the frame using the following (somewhat ugly) code:
uint width, height; frame.GetSize(out width, out height);
This enables us to compute the dimensions of the thumbnail, as I’ve shown in previous articles.
We now need to prepare the output stream for the thumbnail. WIC requires a special kind of stream, IStream (not implemented by System.IO.Stream) and doesn’t directlyunderstand .NET streams. It does provide a number of implementations but not exactly what we need here.
We need to output to memory because we’ll want to persist the same bytes to the response stream and to a local file for caching. The memory-bound version of IStream requires a fixed-length buffer but we won’t know the length of the buffer before we resize.
To solve that problem, I’ve built a derived class from MemoryStream that also implements IStream. The implementation is not very complicated, it just delegates the IStream methods to the base class, but it involves some native pointer manipulation.
Once we have a stream, we need to build the encoder for the output format, which could be anything that WIC supports. For web thumbnails, our only reasonable options are PNG and JPEG.
I explored PNG because it’s a lossless format, and because WIC does support PNG compression. That compression is not very efficient though and JPEG offers good quality with much smaller file sizes. On the web, it matters. I found the best PNG compression option (adaptive) to give files that are about twice as big as 100%-quality JPEG (an absurd setting), 4.5 times bigger than 95%-quality JPEG and 7 times larger than 85%-quality JPEG, which is more than acceptable quality.
As a consequence, we’ll use JPEG. The JPEG encoder can be prepared as follows:
var encoder = factory.CreateEncoder(
Consts.GUID_ContainerFormatJpeg, null); encoder.Initialize(outputStream,
WICBitmapEncoderCacheOption.WICBitmapEncoderNoCache);
The next operation is to create the output frame:
IWICBitmapFrameEncode outputFrame; var arg = new IPropertyBag2[1]; encoder.CreateNewFrame(out outputFrame, arg);
Notice that we are passing in a property bag. This is where we’re going to specify our only parameter for encoding, the JPEG quality setting:
var propBag = arg[0]; var propertyBagOption = new PROPBAG2[1]; propertyBagOption[0].pstrName = "ImageQuality"; propBag.Write(1, propertyBagOption,
new object[] { 0.85F }); outputFrame.Initialize(propBag);
We can then set the resolution for the thumbnail to be 96, something we weren’t able to do with WPF and had to hack around:
outputFrame.SetResolution(96, 96);
Next, we set the size of the output frame and create a scaler from the input frame and the computed dimensions of the target thumbnail:
outputFrame.SetSize(thumbWidth, thumbHeight); var scaler = factory.CreateBitmapScaler(); scaler.Initialize(frame, thumbWidth, thumbHeight,
WICBitmapInterpolationMode.WICBitmapInterpolationModeFant);
The scaler is using the Fant method, which I think is the best looking one even if it seems a little softer than cubic (zoomed here to better show the defects):
Cubic
Fant
Linear
Nearest neighbor
We can write the source image to the output frame through the scaler:
outputFrame.WriteSource(scaler, new WICRect {
X = 0, Y = 0,
Width = (int)thumbWidth,
Height = (int)thumbHeight });
And finally we commit the pipeline that we built and get the byte array for the thumbnail out of our memory stream:
outputFrame.Commit(); encoder.Commit(); var outputArray = outputStream.ToArray(); outputStream.Close();
That byte array can then be sent to the output stream and to the cache file.
Once we’ve gone through this exercise, it’s only natural to wonder whether it was worth the trouble. I ran this method, as well as GDI and WPF resizing over thirty twelve megapixel images for JPEG qualities between 70% and 100% and measured the file size and time to resize. Here are the results:
Size of resized images
Time to resize thirty 12 megapixel images
Not much to see on the size graph: sizes from WPF and WIC are equivalent, which is hardly surprising as WPF calls into WIC. There is just an anomaly for 75% for WPF that I noted in my previous article and that disappears when using WIC directly.
But overall, using WPF or WIC over GDI represents a slight win in file size.
The time to resize is more interesting. WPF and WIC get similar times although WIC seems to always be a little faster. Not surprising considering WPF is using WIC. The margin of error on this results is probably fairly close to the time difference. As we already knew, the time to resize does not depend on the quality level, only the size does. This means that the only decision you have to make here is size versus visual quality.
This third approach to server-side image resizing on ASP.NET seems to converge on the fastest possible one. We have marginally better performance than WPF, but with some additional peace of mind that this approach is sanctioned for server-side usage by the Windows Imaging team.
It still doesn’t work in medium trust. That is a problem and shows the way for future server-friendly managed wrappers around WIC.
The sample code for this article can be downloaded from:
http://weblogs.asp.net/blogs/bleroy/Samples/WicResize.zip
The benchmark code can be found here (you’ll need to add your own images to the Images directory and then add those to the project, with content and copy if newer in the properties of the files in the solution explorer):
http://weblogs.asp.net/blogs/bleroy/Samples/WicWpfGdiImageResizeBenchmark.zip
WIC tools can be downloaded from:
http://code.msdn.microsoft.com/wictools
To conclude, here are some of the resized thumbnails at 85% fant:
What happens to C# 4 optional parameters when compiling against 3.5?
Here’s a method declaration that uses optional parameters:
public Path Copy(
Path destination,
bool overwrite = false,
bool recursive = false)
Something you may not know is that Visual Studio 2010 will let you compile this against .NET 3.5, with no error or warning.
You may be wondering (as I was) how it does that. Well, it takes the easy and rather obvious way of not trying to be too smart and just ignores the optional parameters. So if you’re compiling against 3.5 from Visual Studio 2010, the above code is equivalent to:
public Path Copy(
Path destination,
bool overwrite,
bool recursive)
The parameters are not optional (no such thing in C# 3), and no overload gets magically created for you.
If you’re building a library that is going to have both 3.5 and 4.0 versions, and you want 3.5 users to have reasonable overloads of your methods, you’ll have to provide those yourself, which means that providing a version with optional parameters for the benefit of 4.0 users is not going to provide that much value, except for the ability to provide named parameters out of order. I guess that’s not so bad…
Providing all of the following overloads will compile against both 3.5 and 4.0:
public Path Copy(Path destination)
public Path Copy(Path destination, bool overwrite)
public Path Copy(
Path destination,
bool overwrite = false,
bool recursive = false)
RSS feeds in Orchard
When we added RSS to Orchard, we wanted to make it easy for any module to expose any contents as a feed. We also wanted the rendering of the feed to be handled by Orchard in order to minimize the amount of work from the module developer.
A typical example of such feed exposition is of course blog feeds.
We have an IFeedManager interface for which you can get the built-in implementation through dependency injection. Look at the BlogController constructor for an example:
public BlogController(
IOrchardServices services,
IBlogService blogService,
IBlogSlugConstraint blogSlugConstraint,
IFeedManager feedManager,
RouteCollection routeCollection) {
If you look a little further in that same controller, in the Item action, you’ll see a call to the Register method of the feed manager:
_feedManager.Register(blog);
This in reality is a call into an extension method that is specialized for blogs, but we could have made the two calls to the actual generic Register directly in the action instead, that is just an implementation detail:
feedManager.Register(blog.Name, "rss",
new RouteValueDictionary {
{ "containerid", blog.Id } }); feedManager.Register(blog.Name + " - Comments", "rss",
new RouteValueDictionary {
{ "commentedoncontainer", blog.Id } });
What those two effective calls are doing is to register two feeds: one for the blog itself and one for the comments on the blog. For each call, the name of the feed is provided, then we have the type of feed (“rss”) and some values to be injected into the generic RSS route that will be used later to route the feed to the right providers.
This is all you have to do to expose a new feed. If you’re only interested in exposing feeds, you can stop right there. If on the other hand you want to know what happens after that under the hood, carry on.
What happens after that is that the feedmanager will take care of formatting the link tag for the feed (see FeedManager.GetRegisteredLinks). The GetRegisteredLinks method itself will be called from a specialized filter, FeedFilter. FeedFilter is an MVC filter and the event we’re interested in hooking into is OnResultExecuting, which happens after the controller action has returned an ActionResult and just before MVC executes that action result. In other words, our feed registration has already been called but the view is not yet rendered. Here’s the code for OnResultExecuting:
model.Zones.AddAction("head:after",
html => html.ViewContext.Writer.Write(
_feedManager.GetRegisteredLinks(html)));
This is another piece of code whose execution is differed. It is saying that whenever comes time to render the “head” zone, this code should be called right after. The code itself is rendering the link tags.
As a result of all that, here’s what can be found in an Orchard blog’s head section:
<link rel="alternate" type="application/rss+xml"
title="Tales from the Evil Empire"
href="/rss?containerid=5" />
<link rel="alternate" type="application/rss+xml"
title="Tales from the Evil Empire - Comments"
href="/rss?commentedoncontainer=5" />
The generic action that these two feeds point to is Index on FeedController. That controller has three important dependencies: an IFeedBuilderProvider, an IFeedQueryProvider and an IFeedItemProvider.
Different implementations of these interfaces can provide different formats of feeds, such as RSS and Atom. The Match method enables each of the competing providers to provide a priority for themselves based on arbitrary criteria that can be found on the FeedContext.
This means that a provider can be selected based not only on the desired format, but also on the nature of the objects being exposed as a feed or on something even more arbitrary such as the destination device (you could imagine for example giving shorter text only excerpts of posts on mobile devices, and full HTML on desktop). The key here is extensibility and dynamic competition and collaboration from unknown and loosely coupled parts. You’ll find this pattern pretty much everywhere in the Orchard architecture.
The RssFeedBuilder implementation of IFeedBuilderProvider is also a regular controller with a Process action that builds a RssResult, which is itself a thin ActionResult wrapper around an XDocument.
Let’s get back to the FeedController’s Index action.
After having called into each known feed builder to get its priority on the currently requested feed, it will select the one with the highest priority.
The next thing it needs to do is to actually fetch the data for the feed. This again is a collaborative effort from a priori unknown providers, the implementations of IFeedQueryProvider. There are several implementations by default in Orchard, the choice of which is again done through a Match method. ContainerFeedQuery for example chimes in when a “containerid” parameter is found in the context (see URL in the link tag above):
public FeedQueryMatch Match(FeedContext context) {
var containerIdValue =
context.ValueProvider.GetValue("containerid");
if (containerIdValue == null)
return null;
return new FeedQueryMatch {
FeedQuery = this, Priority = -5 };
}
The actual work is done in the Execute method, which finds the right container content item in the Orchard database and adds elements for each of them. In other words, the feed query provider knows how to retrieve the list of content items to add to the feed.
The last step is to translate each of the content items into feed entries, which is done by implementations of IFeedItemBuilder. There is no Match method this time. Instead, all providers are called with the collection of items (or more accurately with the FeedContext, but this contains the list of items, which is what’s relevant in most cases). Each provider can then choose to pick those items that it knows how to treat and transform them into the format requested.
This enables the construction of heterogeneous feeds that expose content items of various types into a single feed. That will be extremely important when you’ll want to expose a single feed for all your site.
So here are feeds in Orchard in a nutshell. The main point here is that there is a fair number of components involved, with some complexity in implementation in order to allow for extreme flexibility, but the part that you use to expose a new feed is extremely simple and light: declare that you want your content exposed as a feed and you’re done.
There are cases where you’ll have to dive in and provide new implementations for some or all of the interfaces involved, but that requirement will only arise as needed. For example, you might need to create a new feed item builder to include your custom content type but that effort will be extremely focused on the specialized task at hand. The rest of the system won’t need to change.
So what do you think?