CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter
Like CodeBetter.Com? Get more Stuff you need to Code Better at Devlicio.Us Devlicio.us

The State of IronRuby on the ALT.NET Podcast

Mike apparently took some time off from being a new Dad to gather up the latest news on IronRuby.

The State of IronRuby

 


Reminder: DC ALT.NET - 8/28/2008 - Ruby with Jeff Schoolcraft

The August meeting for DC ALT.NET will be on August 28th, 2008 from 7-9PM.  Check our site and our mailing list for continuing updates and future meetings.  This month, Jeff Schoolcraft, ASP.NET MVP, will host a conversation on Ruby.  This will include some Ruby demos, a little bit of Ruby on Rails, as well as approaching it from the ASP.NET mindset.  This talk will go very nicely with our talk on ASP.NET MVC next month.

Approaching Ruby

IronRubyOne of the main intentions for the ALT.NET group I've found is for constant improvement.  This month is no exception as we're stepping outside of the statically typed world that many live in with C#, VB.NET, F# and so on.  We'll be covering the MRI as well as hopefully get to IronRuby.  You can find out more information on progress of IronRuby the IronRuby site.

Details

The details of the event are as follows:

Date/Time: 8/28/2008 - 7-9PM

Location:
Motley Fool
2000 Duke Street
Alexandria, VA, 22314
Map Link

Hope to see a great crowd there!



An entry into lean

I, like many others, have been head deep into lean methodologies such as kaizen, kanban, 5S, value streams and lean in general.  As I continue to learn and practice these things, I’m going to start publishing, much like Laribee is with his focus on Kanban, in order to gain feedback and ideas.  I’m going to cover things in a bit more general matter than just approaching one methodology, but hope to hit on them all.

Today I just want to give a primer into lean so those of you who haven’t done much reading into it have a foundation from which we will build.

When you hear lean, its difficult not to throw the word efficiency into every sentence under discussion.  Efficiency is a metric and is easily measured for most things.  If your car is rated to get 20 MPG and you are achieving only 15 MPG, then your car is 75% efficient as it relates to its gas mileage.

Efficiency has its counterpart, however, and this is waste.  Many people will tell you lean is about eliminating waste, but that is not entirely true.  Lean is about improving efficiency, and waste elimination is typical the least expensive, most effective way to improve efficiency, but its not the only thing.  Thus, don’t focus soley on waste elimination, but on the improvement of efficiency itself.  For developers, obvious waste is easy to spot.  Phone calls, emails, ESPN.com and things of the like are main culprits.  You have to stop and ask yourself and evaluate each activity: does this activity help me achieve my goal for the day as it pertains to adding value to my client/product/service etc.  Identify – correct – sustain.

Kanban, as Dave has been implementing, is one system that helps with waste elimination by having a feedback loop and a continuous work flow by pulling downstream from upstream and current status evaluations.  Other methodologies have different principles behind them, but to achieve the same goals, and I will be talking about those as they each are forms of lean used to Identify Inefficiencies – Make Corrections – Sustain Positive Process Improvements.

Lean literature is everywhere.  Take some of the keywords I’ve talked about here and search the web for them.  You’ll find lots of great information.

NHibernate 2.0: Changes Overview

My post .NET Framework 3.5 SP1: Changes Overview on analysis evolution, structure and quality of the .NET framework code base with NDepend became popular. It shows the interest of the community for the under the hood of popular Fx. Thus came the idea of publishing a similar post for the new release of NHibernate 2.0. This post NHibernate 2.0 Gold Release analysis NHibernate 2.0 with NDepend and lists breaking functional changes, while I'll focus more on structural changes.

 

Changes Overview (compared to NHibernate v1.2.1)

# IL instructions    176 818 to 245 106      (+68 288   +38.6%)
# lines of code (LOC)    25 960 to 36 143      (+10 183   +39.2%)
# lines of comment    25 135 to 29 401      (+4 266   +17%)
Percentage Comment    49% to 44%      (-5%)
# Assemblies    1
# Namespaces    46 to 65      (+19   +41.3%)
# Types    806 to 1 262      (+456   +56.6%)
# Methods    8 190 to 12 016      (+3 826   +46.7%)
# Fields    2 166 to 3 842      (+1 676   +77.4%)


4.852 new public methods:  

SELECT METHODS WHERE IsPublic AND WasAdded

 

548 new public types:  

SELECT TYPES WHERE IsPublic AND WasAdded

 

21 new mamespaces:  

SELECT NAMESPACES WHERE WasAdded

 

2 namespaces removed:  

SELECT NAMESPACES WHERE WasRemoved

 

127 public types removed:  

SELECT TYPES WHERE IsPublic AND WasRemoved

 

5 non-public methods became public: 

SELECT METHODS WHERE IsPublic AND VisibilityWasChanged AND IsInNewerBuild

 

1.230 methods where code was changed

SELECT METHODS WHERE CodeWasChanged

 

380 types where code was changed

SELECT TYPES WHERE CodeWasChanged

 

The following treemap/metric view shows the impact of methods changes or added (in blue). It clearly looks like that the entire code base has been refactored:

 

 

Assembly dependencies

The assembly structure hasn't changed. The NHibernate code is still packaged within one assembly named NHibernate.dll, and this is IMHO the best possible choice since there is no reason to have several physical components for the NHibernate Fx.

 

NHibernate.dll is almost using the same set assemblies: Castle.Core.dll v1.0.3.0 has been added and also Castle.DynamicProxy2.dll v2.0.3.0 is used in lieu of Castle.DynamicProxy.dll v1.1.5.0. Here is the simple graph of assemblies dependencies (made with the version of NDepend v2.10 under development, so stay tuned for this future exciting release scheduled in a few weeks).

 

 

Internal namespaces dependencies

As noted by Jim Bolla a few months ago in his post Analyzing NHibernate with NDepend, the NHibernate code is quite entangled and unfortunately this problem hasn't been fixed for NHibernate v2.0. Basically, on 65 namespaces, 63 namespaces depend directly or indirectly on 62 other namespaces. This is what is shown by the big black square in the dependencies matrix below, taken with the indirect dependencies option of NDepend. A black cell means that the 2 corresponding namespaces are involved in a dependency cycle of minimal length the number displayed.

 

IMHO, the NHibernate team should fix this problem asap. This article Controlling Dependencies to get Clear Architecture explains how we get rid for good of spaghetti in the NDepend code base within in a few days, about 2 years ago. Retrospectively, I estimate that this is by far the best Return On Investment on quality we ever done, much better that any of the thousands automatic tests we wrote. As explained in the article, we now use some NDepend capabilities to prevent new cycle to appear.

 


 

The picture below shows direct dependencies between namespaces of NHibernate. The legend is:

  1. A blue cell means: {the X Namespace} is using {the Y Namespace}.
  2. Weight of a blue cell means: W members (methods and fields) of the {the X Namespace} are used by {the Y Namespace}.
  3. A green cell means: {the Y Namespace} is used by {the X Namespace}.
  4. Weight of a green cell means: W methods of the {the Y Namespace} are using {the X Namespace}.
  5. A black cell means: {the X Namespace} and {the Y Namespace} are using each others.
  6. A red tick on a cell means: the coupling has been changed.
  7. A red tick with a plus on a cell means: the dependency has been created.
  8. A red tick with a minus on a cell means: the dependency has been removed.
  9. A Namespace name underlined means that its code has been changed.
  10. A Namespace name in bold means that it has been added.

 

Code quality

According to the following CQL rules, 464 methods are good candidate to be refactored because they each violates one of the metric threshold.

// <Name>Quick summary of methods to refactor</Name>

WARN IF Count > 0 IN SELECT METHODS WHERE 
                                           
// Metrics' definitions
     (  NbLinesOfCode > 30 OR              // http://www.ndepend.com/Metrics.aspx#NbLinesOfCode
        NbILInstructions > 200 OR          // http://www.ndepend.com/Metrics.aspx#NbILInstructions
        CyclomaticComplexity > 20 OR       // http://www.ndepend.com/Metrics.aspx#CC
        ILCyclomaticComplexity > 50 OR     // http://www.ndepend.com/Metrics.aspx#ILCC
        ILNestingDepth > 4 OR              // http://www.ndepend.com/Metrics.aspx#ILNestingDepth
        NbParameters > 5 OR                // http://www.ndepend.com/Metrics.aspx#NbParameters
        NbVariables > 8 OR                 // http://www.ndepend.com/Metrics.aspx#NbVariables
        NbOverloads > 6 )                  // http://www.ndepend.com/Metrics.aspx#NbOverloads

 

This stance on metrics is a bit extremist and honestly, we have a bunch of methods in the NDepend code base that still don't abide by all these thresholds. While abiding by metric threshold helps a lot to have maintainable code, my opinion is that quality effort must be focused first on rationalizing componentization and dependencies, with non-cyclic dependencies between components, low level components and logical component (namespaces) instead of physical components (assemblies).

 

However, there is a fundamental metric not represented here: Test Coverage. I tried to gather Test Coverage metrics for NHibernate 2.0 to make this data analyzed by NDepend. Unfortunately just running the NHibernate.Test-2.0 project with the option Run Test(s) of TestDriven.NET lead to: 427 passed, 755 failed, 14 skipped, took 3738,55 seconds. I suspect I don't have all DB stuff installed (I am not really a DB guy) and am a bit lazy to go thought all the operation to make it work. (Btw, any code coverage file produced by NCover or VSTS Coverage on NHibernate 2.0 is welcome, just send it with http://www.transferbigfiles.com/ and let me know the url where I can download it. My email is psmacchia at google mail. This way I could complete this post with obtained results).

 

Conclusion

This awesome release of NHibernate is the result of a lot of work done by extremely talented guys. As said, I am not a DB guy, I cannot appreciate the real value of this framework. However, googling it a few seconds lets find dozens of thousands of enthusiast users.

 

This post focuses on structure, quality and evolution and thanks to NDepend capabilities, I find some good things but also some important room for improvement (IMHO) within a few minutes that I wanted to share.
 


 

OMG Rake!

Rake is just... lovely. There's no other way to describe it. I just moved our XEVA Framework to rake and ended up with this build script weighing in at only 35 lines:

Check out the :harvest task where I'm requiring an external file that's contains code for bundling up the various build outputs, currently just .dll and .pdb files. The file looks like this:

I'm in love with Rake -- and the idea of executable configuration in general -- for a few reasons:

1) I can take my scripts down to a reasonable size. The NAnt script for this simple build weighed in at a hefty 162 lines. These two files combined hover around 65. That's much better signal to noise and it only gets better with the more complex scripts.

2) I can make new tasks lickity split. With ruby you just write some code inside a task. It's easy to leverage popular libraries like Hpricot for XML poking and peeking and such. No need to compile tasks and manage separate projects just for build code.

3) I can see what's happening! Ruby is a very flexible language. I can use it's tricks to pimp out my build scripts and I don't need to sift through angle bracket madness. Note how I opened up the Dir class to define a new method, exists?. Ah the joys of ruby. We can make things somewhat familiar to our .NET teams while keeping it idiomatic. With tools like IronRuby this prospect get's even juicier.

Getting Started

If you're just getting started with rake, you need ruby first. The "one click install" at the official Ruby site is what I'm using and it's a pain-free setup. There's plenty on Google, but I recommend giving the Rails Envy boys a click for a super-basic-but-effective primer. Remember the sh method is your friend and, when in doubt, RTFM!

Fast Serialization

lileBook We use ALOT of serialization in the current system I work with. Serializing/deserializing 100,000,000 objects in a day is pretty common. For a long time we knew that the binary formatter was fat and slow but never rationalized writing something custom as we were always fast enough. Unfortunately our data throughput has raised 400% in the last year (when you start with gigs and gigs of messages this is a huge gain) and our little three or four year old dual xeon 2.2 has turned into the little engine that could during peaks lately so we finally bit the big one and threw something together quickly.

A toast ... to the little server that could!

This solution is for a fairly niche condition and is heavily optimized so please read the explanations below to see if it will be good for your scenario before using it.

 

This is the first of a series of posts dealing with this ... Let's start with introducing a new interface to our system

    public interface ICustomBinarySerializable
    {
        void WriteDataTo(BinaryWriter _Writer);
        void SetDataFrom(BinaryReader _Reader);
    }

You would then implement this interface in your object like this, only write out exactly what you need and write it in the simplest way possible.

    class TestObject : ICustomBinarySerializable
    {
        public int Integer;
        public TestObject(){}

        public TestObject(int _Integer)
        {
            Integer = _Integer;
        }

        public virtual void WriteDataTo(BinaryWriter _Writer)
        {
            _Writer.Write((int) Integer);
        }

        public virtual void SetDataFrom(BinaryReader _Reader)
        {
            Integer = _Reader.ReadInt32();
        }
    }

Then I wrote a custom formatter that operates on objects that are ICustomBinaryObjectSerializable. You may note that for the index that represents the type I use an integer. This is probably more appropriate to be a short than an integer and we could save a few bytes here.

    public class CustomBinaryFormatter : IFormatter
    {
        private SerializationBinder m_Binder;
        private StreamingContext m_StreamingContext;
        private ISurrogateSelector m_SurrogateSelector;
        private readonly MemoryStream m_WriteStream;
        private readonly MemoryStream m_ReadStream;
        private readonly BinaryWriter m_Writer;
        private readonly BinaryReader m_Reader;
        private readonly Dictionary<type, int> m_ByType = new Dictionary<type int >();
        private readonly Dictionary m_ById = new Dictionary();
        private readonly byte[] m_LengthBuffer = new byte[4];
        private readonly byte[] m_CopyBuffer;

        public CustomBinaryFormatter()
        {
            m_CopyBuffer = new byte[20000];
            m_WriteStream = new MemoryStream(10000);
            m_ReadStream = new MemoryStream(10000);
            m_Writer = new BinaryWriter(m_WriteStream);
            m_Reader = new BinaryReader(m_ReadStream);
        }

        public void Register(int _TypeId) where T:ICustomBinarySerializable
        {
            m_ById.Add(_TypeId, typeof(T));
            m_ByType.Add(typeof (T), _TypeId);
        }

        public object Deserialize(Stream serializationStream)
        {
            if(serializationStream.Read(m_LengthBuffer, 0, 4) != 4)
                throw new SerializationException("Could not read length from the stream.");
            IntToBytes length = new IntToBytes(m_LengthBuffer[0], m_LengthBuffer[1], m_LengthBuffer[2], m_LengthBuffer[3]);
            //TODO make this support partial reads from stream
            if(serializationStream.Read(m_CopyBuffer, 0, length.i32) != length.i32) 
                throw new SerializationException("Could not read " + length + " bytes from the stream.");
            m_ReadStream.Seek(0L, SeekOrigin.Begin);
            m_ReadStream.Write(m_CopyBuffer, 0, length.i32);
            m_ReadStream.Seek(0L, SeekOrigin.Begin);
            int typeid = m_Reader.ReadInt32();
            Type t;
            if(!m_ById.TryGetValue(typeid, out t))
                throw new SerializationException("TypeId " + typeid + " is not a registerred type id");
            object obj = FormatterServices.GetUninitializedObject(t);
            ICustomBinarySerializable deserialize = (ICustomBinarySerializable) obj;
            deserialize.SetDataFrom(m_Reader);
            if(m_ReadStream.Position != length.i32) 
                throw new SerializationException("object of type " + t + " did not read its entire buffer during deserialization. This is most likely an inbalance between the writes and the reads of the object.");
            return deserialize;
        }

        public void Serialize(Stream serializationStream, object graph)
        {
            int key;
            if (!m_ByType.TryGetValue(graph.GetType(), out key))
                throw new SerializationException(graph.GetType() + " has not been registered with the serializer");
            ICustomBinarySerializable c = (ICustomBinarySerializable) graph; //this will always work due to generic constraint on the Register
            m_WriteStream.Seek(0L, SeekOrigin.Begin);
            m_Writer.Write((int) key);
            c.WriteDataTo(m_Writer);
            IntToBytes length = new IntToBytes((int) m_WriteStream.Position);
            serializationStream.WriteByte(length.b0);
            serializationStream.WriteByte(length.b1);
            serializationStream.WriteByte(length.b2);
            serializationStream.WriteByte(length.b3);
            serializationStream.Write(m_WriteStream.GetBuffer(), 0, (int) m_WriteStream.Position);
        }

        public ISurrogateSelector SurrogateSelector
        {
            get { return m_SurrogateSelector; }
            set { m_SurrogateSelector = value; }
        }

        public SerializationBinder Binder
        {
            get { return m_Binder; }
            set { m_Binder = value; }
        }

        public StreamingContext Context
        {
            get { return m_StreamingContext; }
            set { m_StreamingContext = value; }
        }
    }

So that it is clear how this works ... when you instantiate a custom formatter, you associate types back to integer ids. Example from my tests:

formatter.Register<TestObject>(1);

This says when you get a type id of 1 it should be a TestObject and vice versa when you write a TestObject give it a type id of 1.

 

When writing an object the format is

<4 bytes length><4 bytes type id><object data>

 

When we read the data we first read the 4 bytes of length (n), then read n bytes off the stream. We then copy that into our local buffer (see notes below). We then seek to the beginning of the buffer and tell the object to read its state using the binary reader we provide to it.

 

 

Performance

Before we look at all of the bad an evil things this is doing let's try some basic performance tests. To run tests I used the following simple object (what this library was designed to be really fast with). I grabbed this object off someone's blog who was also playing with serialization and added the interface but can't seem to find the link of which one it was to give credit for saving me a good minutes worth of typing :).

 

[Serializable]
public class Customer : ICustomBinarySerializable {
     private String _lastname;
     private String _firstname;
     private String _address;
     private int _age;
     private int _code;

    public Customer()
    {
        
    }
    public Customer(String lastName, String firstName, String address, int age, int code)
    {
        _lastname = lastName;
        _firstname = firstName;
        _address = address;
        _age = age;
        _code = code;
    }

    public String LastName {
           get {return _lastname;}
           set {_lastname = value;}
     }
     public String FirstName
     {
           get {return _firstname;}
           set {_firstname = value;}
     }
     public String Address
     {
           get {return _address;}
           set {_address = value;}
     }

     public int Age
     {
           get {return _age;}
           set {_age = value;}
     }

     public int Code
     {
           get {return _code;}
           set {_code = value;}
     }

    public void WriteDataTo(BinaryWriter _Writer)
    {
        _Writer.Write((string)_lastname);
        _Writer.Write((string)_firstname);
        _Writer.Write((string)_address);
        _Writer.Write((Int32)_age);
        _Writer.Write((Int32)_code);
    }

    public void SetDataFrom(BinaryReader _Reader)
    {
        _lastname = _Reader.ReadString();
        _firstname = _Reader.ReadString();
        _address = _Reader.ReadString();
        _age = _Reader.ReadInt32();
        _code = _Reader.ReadInt32();
    }
}

 

Speed

To test the speed of the serializer I chose to serialize / deserialize one of these objects 10,000,000 times to/from a MemoryStream.

Test Time (lower is better)
Serialize (Binary) 01:48.54
Serialize (Custom) 00:06.73
Deserialize (Binary) 2:01.29
Deserialize (Custom) 0:08.55

So on serializing the custom one is a whopping 1612% faster and on deserializing it is 1418% faster. That's not too bad as both are more than an order of magnitude.

 

Size

The other area I really wanted to optimize as it is common for us to have 40+ gb transaction files for a day (disk IO is expensive) is the size of each message. Because we are not writing the same kind of schema information that the binary formatter does we can also be quite a bit smaller than its output. For the object given the binaryformatter results in 232 bytes of output while the custom formatter results in 41. This message has quite a few string which add into the amount of serialized data (on our messages (about 40) we average about a 1/10 ratio between the two). Even so its still a 500% gain in storage space required. Don't let this fool you though there are some  ....

 

Problems

There are a number of problems with this type of strategy. It is imperative that you know about the tradeoffs involved with this code before using it. This was written for a niche situation and it may really hurt you if you aren't careful!

 

Versioning

There is no versioning information provided by default in the data. One could easily provide this in their custom serialization implementation but the formatter does not provide it by default for you.

 

Endianess

One of the interesting things here is dealing with the length. I have done this using a quite unsafe (but faster) solution.

    [StructLayout(LayoutKind.Explicit)]
    public struct IntToBytes
    {
        public IntToBytes(Int32 _value) { b0 = b1 = b2 = b3 = 0; i32 = _value; }
        public IntToBytes(byte _b0, byte _b1, byte _b2, byte _b3) {
            i32 = 0;
            b0 = _b0;
            b1 = _b1;
            b2 = _b2;
            b3 = _b3;
        }
        [FieldOffset(0)]
        public Int32 i32;
        [FieldOffset(0)]
        public byte b0;
        [FieldOffset(1)]
        public byte b1;
        [FieldOffset(2)]
        public byte b2;
        [FieldOffset(3)]
        public byte b3;
    }

This has endian problems if you use it on multiple machines that have different endianess like say mono on a ppc vs clr on x86. One could easily get around this by just using BitConverter instead (or doing some binary arithmetic if you miss having real reasons for doing so :)). For us however most of these objects are being serialized between processes on the same machine so its not an issue for us.

 

Copying of Data

Another problem (read: decision) has to do with how the formatter deals with the stream itself internally. It copies data off the stream into an internal memory buffer, it does this so it can reuse the same binaryreader/writer every time. This makes it non-reentrant and forces the copy but in testing with many very small messages the copying of the data turned out to be faster than creating a new Reader/Writer to the original stream on every iteration. This may turn out different for you, I will leave it as an exercise for the reader to change this (I promise it won't take more than 5 minutes)

 

Typing

Its a lot of typing in your objects (we can work around this with some IL generation) but that's a whole other post isn't it.

 

Anyways I hope people enjoy this and can find a niche place of their own to use such a strategy.

A Train of Thought – August 24th, 2008 Edition

Thank God I don’t have my old 3 hours a day of commuting into Manhattan, but I needed to spit out some little blog postlets in more than 140 characters at a time, so I present to you the latest Train of Thought.  Opinionated blathering ahead, comments are always open at the bottom.

 

Don’t Overreach in Your Designs

One of my older posts on The Last Responsible Moment and another post on evolutionary design were recently linked on some of the DZone/DotNetKicks sites and got some traffic.  I got some comments and emails to the effect of “we did evolutionary design and it bit us in the ass with all that refactoring and rewriting.”  Maybe, but let’s talk about how to do evolutionary design in a way that minimizes outright rework.  From my experience, the worst rework results from choosing elaborate abstractions upfront that turn out to be harmful.  The analogy that I like is trying to walk on slippery ground.  Anybody who’s walked across an icy patch or a muddy field this knows that the way to do it is to keep your feet as close to your center of gravity as possible by taking short steps.  If you take a big step you’re much more likely to slip and fall.  Design is the same way.  Bad things happen when you allow your design thinking and abstractions get ahead of your development and requirements.

We’re doing evolutionary design, and yes we have had to rewrite some functionality when we’ve found shortcomings in the design or simply found a better way to do it.  I would attribute the worst example of avoidable rework on our project to overreaching with some infrastructure outside of user stories.  We’re using the new ASP.Net MVC framework, and we didn’t like the way that it handles (or really doesn’t handle) the “M” part of the MVC triad.  We had one of those conversations that starts with “wouldn’t it be cool if…” and ended with one or both of us spending days of architectural spiking on an approach for screen synchronization – before we created our first web page.  We created the idea of a “ViewModel” that would represent screen state and help us to move data between the web page form elements and our Domain Model objects.  We wrote a very elaborate code generation scheme to automate a lot of the grunt coding.  As soon as we started to work on our first couple web pages we quickly realized that much of our ViewModel infrastructure was unnecessary or just plain wrong.  We effectively rewrote the ViewModel code generation in a simpler way and got on with the project.  Since then, we’ve extended the ViewModel code generation to add new behaviors on an as needed basis, but we haven’t had to rewrite any of it.

Just to head off the comments, I didn’t know about the BindingHelperExtensions in the MVC at the time (shame on me).  I don’t regret rolling our own infrastructure at all because I don’t think that BindingHelperExtensions is adequate, but I wish we’d played it a little smarter and put off the ViewModel code generation until we had a couple working pages to point out the real patterns.

What I’m trying to say here is to avoid speculative abstractions and fancy patterns outside of feedback from the real features and needs of the system.  It’s relatively painless to extend simple code for more elaborate usages than its original intentions, but it hurts to throw out or change elaborate code.  You can hedge your design bets by (almost) always starting simple.

 

Enabling Evolutionary Design

So, how do you do evolutionary design without incurring a lot of rework?  Here’s my recipe:

  • Worry a lot about cohesion and coupling as you work. 
  • Follow the Open/Closed Principle
  • Follow the Single Responsibility Principle
  • Use TDD or BDD for low level design.  First because it does more to ensure good cohesion and coupling on a class by class basis than any other technique, but also because the automated unit testing coverage left behind enables changes in the code to be cheaper in many cases.  Regression testing is a cost and risk associated with changing code is a considerable road block to making design improvements midstream.  If you reduce that cost and risk, evolutionary approaches are a lot more attractive.  That test coverage is one of the ways that TDD/BDD is more valuable a practice than merely applying some unit tests after the fact to strategic areas of the code.

From an old post:

One way to think about TDD is an analogy to Lego blocks. The Lego sets I had as a child were the very basic block shapes. Using a lot of little Lego pieces, you can build almost anything your imagination can create. If you buy a fancy Lego set that has a single large piece shaped like a pirate's ship, all you can make is a variation of a pirate ship.

In that context I was talking about TDD, but I feel like the analogy holds very true for doing evolutionary design effectively.  Composing your system of small Lego pieces that can be rearranged is much better than using big monolithic pieces of code that are more likely to be modified later.

In the end, it really amounts to just design things well, all the time.  Unsurprisingly, I think that teams with strong software design skills are best equipped to do evolutionary design.

 

On Software Factories

Last week I was at an Open Spaces event in Colorado with a very diverse group of folks in a session that rambled around until “Software Factories” came up.  I stated, and not for the first time, that Software Factories are often Big Design Upfront dressed up in sexier new clothes.  I definitely think the software factory idea can work (with Ruby on Rails as exhibit A), but I think the activity of defining elaborate project and class templates upfront is risky or at least unoptimal.  A project team’s is going to have much less willingness to reconsider designs if design changes require changing the software factory templates.  To me, software factory techniques will succeed if and only if it’s easy to modify the factory automation guidance as the team works and learns more about their system.

My other point with software factories was that I think micro code generation (live templates, file templates, ReSharper tricks, scaffolding, etc.) where the developer is in complete control has a much better chance of succeeding than the elaborate factories that try to generate most of the application for you. 

 

Opinionated Software

Ruby on Rails introduced “Opinionated Software” into the common lexicon, but it’s been around for a while.  I think that my team is gaining some advantages from our design’s “opinions,” but what if you don’t like the opinions of your chosen framework?  Take CSLA.Net as an example.  I want absolutely nothing to do with CSLA.Net because I think its basic model is severely flawed, but I bet that it’s providing a lot of value for many teams.  That value is largely because CSLA.Net has firmly engrained “opinions” about how almost every common development task should be done.  I can’t use CSLA.Net, and a lot of the Microsoft tooling for that matter, because I don’t agree with the “opinions” baked into that tooling.  I’ll happily build my own infrastructure to support the way that *I* feel software should be created, or go around a piece of infrastructure I don’t agree with.  Heck, the MVC framework isn’t even released and we’ve already considerably diverged from its opinions.  Other developers will simply go with the flow of whatever tooling that their using and invest time into learning the idioms of that particular tool and not waste time questioning that tool. 

I think this comes down to a question of “go with the flow or change the course of the river.”  I’m a “change the course of the river” to the optimal path kind of guy, but I frequently wonder if it would be better to just give up and go with the flow.

 

TypeMock is only a Bullet of Ordinary Composition

I was out of pocket last week at a little open spaces event, so I missed most of the latest Twitter and blogging flareup of the TypeMock question.  I’ll repeat my opinion that there’s nothing inherently wrong with TypeMock itself, but I think that the rhetoric from TypeMock proponents is often harmful to the greater discussion of software design and practices.

TypeMock might be a better mocking framework than Rhino Mocks or Moq, but it does NOT change the fundamental rules of mock object usage.  Just because you can use TypeMock to mock a dependency doesn’t mean that it’s the right thing to do.  Let’s remember some of my rules of mock object usage:

  • Don’t ever try to mock chatty interfaces like ADO.Net or anything related to HttpContext because the effort to reward ratio is all wrong and you can never read those tests anyway. 
  • Be extremely cautious of mocking interfaces that you do not understand. 

The only thing that TypeMock changes is *how* the mock object is introduced into the code being tested.  If you really think that having separate interface definitions plus Dependency Injection is hard, then yeah, use TypeMock (an assertion that I would obviously dispute in this age of auto wiring, auto mocking containers, ReSharper, and convention driven configuration of IoC containers).  Just remember a couple things please:

  • Mocking in general isn’t going to be an effective technique with classes that aren’t cohesive or have a lot of semantic coupling with their dependencies.  In other words, interaction testing with any mock object is going to be painful with badly written code.  TypeMock simply doesn’t change that equation.  I’ve heard TypeMock put forward as a solution for unit testing legacy code.  In theory yes, but the reality that I’ve found is that interaction testing inside Legacy Code is an exercise in pain.  Most legacy code (and I’m using the Feathers definition of legacy code here) has very poor internal structure and poor separation of concerns.  Exactly the kind of code that you shouldn’t bother using interaction testing on.  I’d instead recommend surrounding Legacy Code with more coarse grained integration tests to preserve behavior first, then trying to modify the internal code to a better structure before writing fine grained unit tests.  Yes, it is possible to use TypeMock to “unit test” typical legacy code, but those tests would almost automatically be the type of overspecified unit tests that cause more harm than good.  The problem with legacy code is often the structure of the code more than the fact that it doesn’t have any unit tests.
  • Yes, you can unit test the class in question that news up its own dependencies and calls static methods, but you still have a very tight runtime coupling to those dependencies and the static method calls.  Regardless of your ability to unit test the class in question, that tight coupling can often be a problem.  Your ability to reuse those classes is compromised by the tight dependencies.  Your ability to practice evolutionary design is compromised because of the tight coupling.  Remember that Dependency Inversion and Inversion of Control have other benefits than just unit testing.

I think the TypeMock proponents are too focused on unit testing in a way.  I firmly believe that code that can’t be efficiently unit tested is almost automatically bad code (to me, testability == productivity).  However, code that can be unit tested isn’t necessarily good.

To recap, I don’t think there’s anything wrong with TypeMock per se, but I think that much of the TypeMock proponent’s rhetoric is irresponsible.  Just because TypeMock *can* do something, doesn’t mean that doing that something is a good idea. 

 

 

In Tribute to George Carlin

I couldn’t think of 7, and it’s a couple months late for a Carlin tribute, but here’s my list of the words or phrases that are henceforth banned from appearing in my blog or presence (starting right now).  Almost no conversation is going to be useful if it includes one of these words:

  • Mort – Apparently Microsoft is now referring to the developer formerly known as “Mort” as “Pragmatic Developers.”  Puh-leeze.  Everybody in the world thinks that they’re pragmatic, but yet we disagree on many significant directions in the best way to build software.  I was dead set against ALT.NET getting renamed “Pragmatic.Net” for the same reasons.  I gotta say though, “Pragmatic Developer” is much less a pejorative than “Mort” became and the typical “Joe Schmoe Developer who builds LOB apps at General Motors” line you hear from Microsoft employees.
  • Entity Framework – At least until there’s something new to say.  I’m liking that my attention lately has been on the advance of Fluent NHibernate instead of worrying about a tool that I’m very unlikely to use in the next 2-3 years.
  • Stored Procedures – I’ve seen nothing to change my opinion about sprocs for several years (good for edge cases and utility database scripts, bad everywhere else, i.e. 95%+ of the time I think sprocs are unnecessary)
  • TypeMock
     
  • “Vietnam of Software Development” – Most  overblown and misused analogy this side of Software as Construction.
  • “Software as Construction” – I worked on the engineering side of construction projects measured in the 100’s of million dollars and even billion dollar+ projects (and this was in the pre-W days when the USD was more than paper money), plus I worked for my father building houses as well.  I feel perfectly qualified to say that the “Software as Construction” analogy is an extremely poor fit.  Software as Manufacturing is better, but I bet that somebody will write a rant about that comparison in the next couple years.
  • Foo Considered Evil – It’s a cliche now
  • “Cargo Cult” – used as a magic talisman to win any argument, regardless of whether the use of the phrase is applicable or not.
  • “Your Emperor has no Clothes” – see above
  • “Jumped the Shark” – see above
  • “You should just use whatever is best for your project” – The intellectual equivalent of empty calories
  • “You’re just being dogmatic!” – Lamest way to try to win an argument.  Basically, this is code for “I’m pissed that you don’t agree with me so I’m just going to call you names and declare victory, so there!”
  • “You can just Refactor it later” – You can write simplistic code upfront and say you’ll refactor it later to eliminate duplication or handle more complicated cases as those cases arise, but you don’t write bad code on purpose.  You certainly don’t use Refactoring as an excuse to just not think about design.
  • “We’re refactoring” when the team really means “we’re rewriting that code altogether.”  There’s no such thing as a big refactoring.

 

 

Okay, I’m done.  Your turn:

More Posts Next page »


Our Sponsors

Free Tech Publications


What's New