Skip to content

KodefuGuru - Chris Eargle
Syndicate content
Life Student of the Kodefu Arts
Updated: 7 hours 2 min ago

Shortest String That Contains All Words

Tue, 01/03/2012 - 07:47

Mango12 created an interesting competition to kick off the New Year, and I decided to try it out. It’s a simple task along the lines of a code kata, and I recommend you try it yourself before looking over my solution.

Task: Compress a list of words into the shortest string that contains all words.

Test: “testing”, “ginger”, “german”, “minutes” should become “minutestingingerman”

Here is my approach: create a weighted graph connecting each term then recursively reduce the highest weighted edges.

The test to see if it works begins with four terms.

image

Two different edges can be reduced. The test won’t be affected by which one I choose. If I optimize this in the future, it would be prudent to check the nodes and merge the two with no other good matches.

image

The choice is clear this time.

image

After the final merge, we end up with minutestingingerman.

I started off with a test to prove that it works.

[Test]
public void ShouldCompressListOfWordsToShortestStringThatContainsAllWords()
{
    var terms = new[] {"testing", "ginger", "german", "minutes"};
    StringGraph graph = new StringGraph(terms);
    Assert.AreEqual("minutestingingerman", graph.Compress());
}

I then created a StringGraph class with weighted nodes. The Node class contain edges and a function to calculate weight when a node is added. In building the solution, I encountered a problem where I was mixing mutable and immutable methods. If I return to this in the future, I should decide which direction I want to go in. I decided to just make it work for the time being.

Since I have a myriad of classes for building the graph, I  will focus on a few important points. The solution is available here if you’re interested in trying it out for yourself.

Calculating Weight

I created a Node class that contains two generics: one for the value type and another for the weight type. To create edges, nodes are added and the weight must be calculated at that time. I decided to make the weight calculation a function that can be assigned by creator. The default function will set the weight to the default of the weight type.

public class Node<T, TWeight> : INode<T, TWeight>
{
    private List<IEdge<T, TWeight>> edges = new List<IEdge<T, TWeight>>();
    
    Func<INode<T, TWeight>, INode<T, TWeight>, TWeight> calculateWeight;

    public Func<INode<T, TWeight>, INode<T, TWeight>, TWeight> 
                                               CalculateWeight { get { if (calculateWeight == null) { calculateWeight = (n1, n2) => default(TWeight); } return calculateWeight; } set { calculateWeight = value; } } public IEnumerable<IEdge<T, TWeight>> Edges { get { return edges; } private set { edges = new List<IEdge<T, TWeight>>(value); } } public T Value { get; private set; } public Node(T value) { this.Value = value; } public void Add(INode<T, TWeight> node) { var edge = new Edge<T, TWeight>(this, node,
                                        CalculateWeight(this, node)); edges.Add(edge); } }

StringGraph assigns the function which checks for the overlap at the end of the parent node and the beginning of the child node.

calculateWeight = (node, newNode) =>
    {
        int maxLength = node.Value.Length < newNode.Value.Length ?
            node.Value.Length : newNode.Value.Length;
        for (int i = node.Value.Length - maxLength; i < maxLength; i++)
        {
            if (node.Value.EndsWith(
                  newNode.Value.Substring(0, maxLength - i))) { return maxLength - i; } } return 0; };

The values generated by this function correlate with the numbers in the graphs above.

Reducing the Graph

The Reduce method is meant to determine two nodes that should be merged and create a new, smaller graph. To make my test turn green, I wrote the following method.

 

public StringGraph Reduce()
{
    if (nodes.Count <= 1)
    {
        return this;
    }

    var edges = from n in nodes
                from e in n.Edges
                select e;
    var edge = edges.First(n => n.Weight == edges.Max(e => e.Weight));
    
    return new StringGraph(nodes.Select(n => n.Value)
                                .Where(str => str != edge.Parent.Value && 
                                                    str != edge.Node.Value) .With(edge.Combine().Value)); }

Although I didn’t put all the necessary error checking in this program, I felt it important that someone does not try to reduce a 1 or 0 node graph. I then pull up every edge, obtain the first one with the highest weight, then combine the nodes it connects.

This part of the code could use some optimization. I am creating an entirely new graph, which may be correct in a class library where immutable types are expected. However, StringGraph is far from immutable, and there is overhead from making it build up a new graph. If I wanted to generate a reduced graph without affecting the current instance, it may be better to implement cloning and a better node merging algorithm. The edges going to the parent and from the child will typically be the same.

More Improvements

There are other improvements that can be made. I’m not currently detecting and removing nodes that represent inner strings, and there are likely better algorithms for NP problems. This four string example is rather simplistic, and there is much more ground that can be covered. Give it a shot and create your own solution, or improve upon the one I’ve presented. Be sure to tell me your approach in the comments!

Categories: Blogs

Static Constructors Should Be Private

Tue, 11/29/2011 - 22:48

Static constructors should be private: a pretty simple rule to follow and one most .NET developers are forced to do. So why do I bring it up? During a webinar earlier today, I demonstrated the new static constructor mocking capabilities in JustMock. I decided to offer that piece of advice because non-private static constructors are a security flaw. However, I was in a C# project, and it’s impossible to use a non-private static constructor anyway.

It’s rare that I need to use static constructors, and I was lacking an example of one with an external dependency. In fact, that’s something I would try to avoid. So, I needed to find a real world example to show why this feature is useful. I did a quick search and found this code analysis entry on MSDN: If a static constructor is not private, it can be called by other code. This is a breaking security issue. I filed this info away and continued my search for a dependency in a static constructor.

I should have read a little further. The article also states: “This rule is enforced by the C# and Visual Basic .NET compilers.” For the vast majority of .NET developers, this rule is irrelevant. Having never had an inclination to make a public static constructor, I failed to realize that I was giving inconsequent advice.

If you’re coding in a language that does support non-private static constructors, please realize the security risk involved. If you are instantiating dependencies in your static constructor, consider a different approach. If one isn’t feasible, JustMock has your back for testing the class.

Categories: Blogs

Partial Application

Wed, 11/16/2011 - 07:27

Functions have a given number of arguments which define the arity of the function. The process of producing another function by setting any number of arguments to a function is known as partial application. C# developers are most familiar with this concept by calling one method from a more specific method, thereby adhering to the DRY principle.

For an example, I have decided to name the div operator with a Divide method.

public static double Divide(double x, double y)
{
    return x / y;
}

I could then use partial application to create an Inverse method, fixing the first argument to the Divide method as the value 1.

public static double Inverse(double y)
{
    return Divide(1, y);
}

This adheres to the DRY principle since the details to the Divide method may change. Perhaps I will choose to use a math library in the future, and the Inverse method is a more specific version of the Divide method, so changes should affect both places.

The context most people hear about partial application is with functional programming. Methods are procedures or functions associated with a class, and that’s why many functional techniques are applicable in the object-oriented world. In C#, most people associate functional programming with anonymous functions, so let’s see how we can use partial application with it.

Func<double, double, double> div = (x, y) => x / y;
Func<double, double> inv = y => div(1, y);

Using partial application with lambda expressions was straightforward. First, I defined a div function. Then, I defined an inv function, calling the div function with 1 as its first argument. There is something to be desired here
 it feels like partial application should be built into the language somehow. If you call div(1), it should apply the argument and return a reduced function. Since C# doesn’t support this, we will have to use what tools we have available to build something similar.

public static class FuncExtensions
{
    public static Func<T2, TResult> Apply<T, T2, TResult>(this Func<T, T2, TResult> func, T arg1)
    {
        return arg2 => func(arg1, arg2);
    }
}

I wrote an extension method to apply the first argument to a binary function (you can fill in the rest). This cleans up the original code.

Func<double, double, double> div = (x, y) => x / y;
var inv = div.Apply(1.0);

The compiler cannot expand the type to match the extension method, so it is necessary to specify a double when passing the argument. However, it is now possible to use an implicit type declaration since the method returns a Func with one less generic parameter.

I believe I explained clearly why you should care about this when it comes to methods, but it may be less clear why you need this when it comes to anonymous functions. You may be defining functions in the same manner you are doing methods, and you are attempting to not repeat yourself. What I think may be a more compelling case is that sometimes you need to pass functions with a specific signature, and the best way to achieve that signature is to apply known information. This is powerful in systems that dynamically create and consume functions (business rules), and it simplifies the passing of data as functions carry requisite values.

Partial application is a useful tool, and you’ve likely used it without realizing it.

Categories: Blogs

Arity

Thu, 11/10/2011 - 07:40

The term ‘arity’ evokes images of complex differential diagrams that requires a graduate degree to understand, but the concept itself is rather simple. Arity is a value that represents the number of arguments a function takes. Does your method accept a single argument? Then it’s arity is one. Does it accept two arguments? Then it’s arity is two. Simple.

So it’s simple: a method’s arity is identical to its number of parameters? Not so fast! Arity refers specifically to arguments. Arguments are the values you pass into a method or function. A parameter typically expects you to pass a value to it when you call a method, but it’s possible that the value has been defaulted.

Bob Martin’s Clean Code Tip of the Week #10 was “Avoid Too Many Arguments.” That seems fair enough, but why is it that you hear things like “functions rarely have an arity greater than one.” There are two primary issues at play here.

1. Function inputs can be represented as composite types so that every argument is reduced to one.

2. Functions can be curried so that functions have one argument and returns another function (which has one argument and continues until the full chain is unraveled).

When closures occur, number one is sure to follow to capture the variables used within the lambda expression. The important thing to note is that although the compiler is taking care of the closure, human readers of the code are still experiencing many arguments. Uncle Bob’s tip should still be followed.

Number two is a powerful language feature, but it suffers from the same issue as above.

Lower arity also occurs when side effects from other external sources (such as system variables) are resulting in affected out with fewer arguments. This is undesirable, but sometimes necessary.

As always, keep your code readable and maintainable. Strive for lower arity, but don’t let it break you. Besides, I’d rather you call it “binary” than “arity 2.”

Categories: Blogs