CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Greg Young [MVP]

  • A New Year a New Chapter


    I usually try to keep everything here extremely technical but here is a short update about me in general. As some of you may know I have been planning a move for a few weeks now. I didn't really know quite where I was going, only that I was in for change. I don't really intend to settle down anywhere but instead to move from place to place.



    As of 8am on Saturday my leaving was complete. I arrived here in Montreal where I will be staying as a visitor for a few months, hopefully I can become conversational in French again before I leave. I arrived with everything that I owned contained in 2 suitcases with a total weight of 110 lbs (I need to get rid of a few more things to get myself below the airline weight limit). I have to say that the getting rid of all of your possessions is extremely liberating and I am looking forward to the next period of my life without a focus on material possessions.



    On the work front I hope to be picking up a local contract for 3-6 months. It was very hard to get this setup beforehand with everyone being on vacation but hopefully it will come together fairly easily. If you are in the US and are looking for someone I do have a few weeks of availability until I take something or may not take something if it goes longer. If interested drop me an email.





    Everything here in Montreal is quite different but some things are pretty easy to pick up. Go Nucks Habs Go! I already have a good number of funny greg-was-speaking-french-and ... stories to share at the next conference (alt.net seattle I am thinking though its a much longer trip from here!).



    I hope to be spending more time focused on writing while I am here so expect some more content!
     

  • DDD: Specifications, Language, and Locality

    Consider the following code.

    public class Customer {
         .
         .
         .
         public bool IsPreferredCustomer {
                get {
    return (this.TotalSales > 10000 || this.TotalVisits > 50) || (this.TotalSales > 5000 && this.TotalVisits > 25);
    }
    }

    In this example a common pattern is being followed where an attribute IsPreferredCustomer has been exposed directly from the Customer object. The attribute in terms of language is being used to abstract the concept of a customer who has spent ten thousand dollars, had over fifty visits, or has over five thousand dollars in sales and twenty-five visits to the location.

     

    There are other options in how to implement this however, namely by using the Specification Pattern

    public class IsAPreferredCustomer : Specification<Customer> {
        public bool Matches(Customer customer) {
               if(customer == null) return false;
               return customer != null && (customer.TotalSales > 10000 || customer.TotalVisits > 50) || (customer.TotalSales > 5000 && customer.TotalVisits > 25);
    }

     

    So the question quickly becomes which of these is correct under what circumstances? In order to answer this question one must look at the strengths and weaknesses of each.

     

    Language

    The main difference between the two examples can be seen in how they are represented in terms of the ubiquitous language.

     

    When dealing with the attribute approach in the ubiquitous language the attribute is applied to every instance when discussing the entity. It is treated just like any other attribute, the name of the customer as an example. By simply listening to the language it is unspecified whether this represents a calculation being performed or is a value associated with the entity.

     

    The Specification maintains a subtly different linguistic representation. The specification applies a name to a constraint. Since it is defining a name to a constraint it is known based on its usage that it is a calculation as opposed to a denormalization. It is also made clear that because it is defined as a specification, that it’s purpose is to constrain. When an attribute is available upon an entity, constraints can be but do not necessarily have to be made upon it. In other words, the attribute is available to constraining code but may have another purpose in general. The specification makes the intent of constraining inherent in the language which is a primary goal of the ubiquitous language, making the implicit explicit.

     

    Code Locality

    When an attribute is added to the entity it being part of the entity is necessarily located near the rest of the code associated with that entity. This can be an advantage in terms of maintenance as the code associated with the entity is localized and can be easier to find/follow by a developer traversing the source base.

     

    The localization of the validation logic can also be a hindrance. Because the code maintains its locality one can run into code explosion when there are many of these attributes. The area continues to grow as new code gets added eventually reaching a point where it has crossed the threshold of being able to be kept straight in a single location. Refactoring away from attributes at this point can be a pain.

     

    Another problem with maintaining the locality is that we can only reasonably have a single implementation of the attribute. If one were faced with a multi-tenant situation or having multiple deployments with varying rules the specification would be preferred as it provides a seam for replacement.

     

    The C# programming language offers some interesting ways of dealing with the possible explosion of code within the original entity. Partial classes allow a class to be defined across multiple files and extension methods allow static methods stored elsewhere to “appear” to be a part of the entity while they are in fact just static methods. In the case of constraints this is actually a good thing as the constraints don’t really “belong” in the entity but are describing it. These types of syntactical sugar can make things much easier to deal with in terms of dealing with possible code explosion and should be considered as options if they are available in a given language. As an example consider the following C# code that uses both the specification pattern and extension methods to create what some may consider a more concise API around the previous example.

     

    Encapsulation

    Many bring up encapsulation as a key difference between the two patterns shown. The attribute-based example encapsulates the logic representing the constraint within it. One must question whether the entity is in fact the proper place to put this logic. At first smell it would seem that it is as in order to write a constraining specification the specification needs access to the internal state of the entity.

     

    The need of a constraining specification often leads to the exposing of the state globally from the entity that as illustrated previously [see Getter/Setter Anti-Pattern] can cause other problems within the domain. The solution to this issue is to adjust the tools being used as opposed to the way modeling is performed; some languages have constructs such as friend classes that handle this case in a more elegant manner.

     

    Analysis

    Upon analyzing the various usages it becomes apparent that the use of an attribute to expose the constraint is usually an anti-pattern. A specification will always better model a constraint in the ubiquitous language. The intention of the modeler to constrain is also made explicit while remaining implicit with the attribute version. While the [Programmer Pornography] of encapsulation or especially code locality may end up with an advantage to the attribute based version it is also a long term risk as it can easily explode and is difficult to refactor.

     

    Rule of Thumb: Avoid the use of attributes on entities that are constraints. Prefer to use a specification as it makes the intent to constrain explicit.

    Posted Dec 21 2008, 02:42 PM by Greg with 17 comment(s)
    Filed under:
  • Quiz Answers

     

    Assume a class A that has a static integer field Ten.

    Assume a class B that has a static integer field Ten.

     

     

    What must be true for the following lines to output 32

     

    A.Ten = 10;
    B.Ten = 10;
    Console.WriteLine(A.Ten + ++B.Ten + A.Ten);

     

     

     

    Many got this wrong not because they misunderstood the problem (or even had the wrong answer) ... but because they did not read the question properly.

     

    What must be true for the following lines to output 32

     

    As an example many answered that A must inherit from B. This is a correct possibility but it is not the only one ... it does not need to be true. There are 4 possibilities for how this could happen

     

    A : B (static in B)

    B : A (static in A)

    A : X B : X (static in X), there are infinite permutations of this but they can be represented in a single pattern

    A : X B : X X : Y (static in Y) again infinite permutations

     

    We could say that A and B share an ancestry where the static variable is defined at or above the point of divergence.

     

    Glad to see everyone had on their thinking caps!

  • Interesting Code

    Umm yeah.

     

        class A
        {
            public class B : A
            {
            }
        }



        //slightly weird
        class C : A.B
        {
            public void Foo(C.B b)
            {
            }
        }



        //Love the edge conditions

        class D : A
        {
            public void Foo(D.B.B.B.B.B.B.B.B.B.B.B.B.B.B.B.B b)
            {
            }
        }

  • Required Course

    I don't recommend very much stuff to people that actually costs anything more than time but ...

     

    Udi is coming over and doing his SOA course in Austin. I can't recommend highly enough that people take this if they want to learn SOA/Distributed Systems right.

     

    http://www.headspringsystems.com/soa/ 

     

     That and you can enjoy the esoteric discussions Udi and I end up in over a few beers late at night in a random conference hotel without thinking we disagree with each other :)

  • Fast Serialization

    lileBook We use ALOT of serialization in the current system I work with. Serializing/deserializing 100,000,000 objects in a day is pretty common. For a long time we knew that the binary formatter was fat and slow but never rationalized writing something custom as we were always fast enough. Unfortunately our data throughput has raised 400% in the last year (when you start with gigs and gigs of messages this is a huge gain) and our little three or four year old dual xeon 2.2 has turned into the little engine that could during peaks lately so we finally bit the big one and threw something together quickly.

    A toast ... to the little server that could!

    This solution is for a fairly niche condition and is heavily optimized so please read the explanations below to see if it will be good for your scenario before using it.

     

    This is the first of a series of posts dealing with this ... Let's start with introducing a new interface to our system

        public interface ICustomBinarySerializable
        {
            void WriteDataTo(BinaryWriter _Writer);
            void SetDataFrom(BinaryReader _Reader);
        }

    You would then implement this interface in your object like this, only write out exactly what you need and write it in the simplest way possible.

        class TestObject : ICustomBinarySerializable
        {
            public int Integer;
            public TestObject(){}
    
            public TestObject(int _Integer)
            {
                Integer = _Integer;
            }
    
            public virtual void WriteDataTo(BinaryWriter _Writer)
            {
                _Writer.Write((int) Integer);
            }
    
            public virtual void SetDataFrom(BinaryReader _Reader)
            {
                Integer = _Reader.ReadInt32();
            }
        }

    Then I wrote a custom formatter that operates on objects that are ICustomBinaryObjectSerializable. You may note that for the index that represents the type I use an integer. This is probably more appropriate to be a short than an integer and we could save a few bytes here.

        public class CustomBinaryFormatter : IFormatter
        {
            private SerializationBinder m_Binder;
            private StreamingContext m_StreamingContext;
            private ISurrogateSelector m_SurrogateSelector;
            private readonly MemoryStream m_WriteStream;
            private readonly MemoryStream m_ReadStream;
            private readonly BinaryWriter m_Writer;
            private readonly BinaryReader m_Reader;
            private readonly Dictionary<type, int> m_ByType = new Dictionary<type int >();
            private readonly Dictionary m_ById = new Dictionary();
            private readonly byte[] m_LengthBuffer = new byte[4];
            private readonly byte[] m_CopyBuffer;
    
            public CustomBinaryFormatter()
            {
                m_CopyBuffer = new byte[20000];
                m_WriteStream = new MemoryStream(10000);
                m_ReadStream = new MemoryStream(10000);
                m_Writer = new BinaryWriter(m_WriteStream);
                m_Reader = new BinaryReader(m_ReadStream);
            }
    
            public void Register(int _TypeId) where T:ICustomBinarySerializable
            {
                m_ById.Add(_TypeId, typeof(T));
                m_ByType.Add(typeof (T), _TypeId);
            }
    
            public object Deserialize(Stream serializationStream)
            {
                if(serializationStream.Read(m_LengthBuffer, 0, 4) != 4)
                    throw new SerializationException("Could not read length from the stream.");
                IntToBytes length = new IntToBytes(m_LengthBuffer[0], m_LengthBuffer[1], m_LengthBuffer[2], m_LengthBuffer[3]);
                //TODO make this support partial reads from stream
                if(serializationStream.Read(m_CopyBuffer, 0, length.i32) != length.i32) 
                    throw new SerializationException("Could not read " + length + " bytes from the stream.");
                m_ReadStream.Seek(0L, SeekOrigin.Begin);
                m_ReadStream.Write(m_CopyBuffer, 0, length.i32);
                m_ReadStream.Seek(0L, SeekOrigin.Begin);
                int typeid = m_Reader.ReadInt32();
                Type t;
                if(!m_ById.TryGetValue(typeid, out t))
                    throw new SerializationException("TypeId " + typeid + " is not a registerred type id");
                object obj = FormatterServices.GetUninitializedObject(t);
                ICustomBinarySerializable deserialize = (ICustomBinarySerializable) obj;
                deserialize.SetDataFrom(m_Reader);
                if(m_ReadStream.Position != length.i32) 
                    throw new SerializationException("object of type " + t + " did not read its entire buffer during deserialization. This is most likely an inbalance between the writes and the reads of the object.");
                return deserialize;
            }
    
            public void Serialize(Stream serializationStream, object graph)
            {
                int key;
                if (!m_ByType.TryGetValue(graph.GetType(), out key))
                    throw new SerializationException(graph.GetType() + " has not been registered with the serializer");
                ICustomBinarySerializable c = (ICustomBinarySerializable) graph; //this will always work due to generic constraint on the Register
                m_WriteStream.Seek(0L, SeekOrigin.Begin);
                m_Writer.Write((int) key);
                c.WriteDataTo(m_Writer);
                IntToBytes length = new IntToBytes((int) m_WriteStream.Position);
                serializationStream.WriteByte(length.b0);
                serializationStream.WriteByte(length.b1);
                serializationStream.WriteByte(length.b2);
                serializationStream.WriteByte(length.b3);
                serializationStream.Write(m_WriteStream.GetBuffer(), 0, (int) m_WriteStream.Position);
            }
    
            public ISurrogateSelector SurrogateSelector
            {
                get { return m_SurrogateSelector; }
                set { m_SurrogateSelector = value; }
            }
    
            public SerializationBinder Binder
            {
                get { return m_Binder; }
                set { m_Binder = value; }
            }
    
            public StreamingContext Context
            {
                get { return m_StreamingContext; }
                set { m_StreamingContext = value; }
            }
        }

    So that it is clear how this works ... when you instantiate a custom formatter, you associate types back to integer ids. Example from my tests:

    formatter.Register<TestObject>(1);

    This says when you get a type id of 1 it should be a TestObject and vice versa when you write a TestObject give it a type id of 1.

     

    When writing an object the format is

    <4 bytes length><4 bytes type id><object data>

     

    When we read the data we first read the 4 bytes of length (n), then read n bytes off the stream. We then copy that into our local buffer (see notes below). We then seek to the beginning of the buffer and tell the object to read its state using the binary reader we provide to it.

     

     

    Performance

    Before we look at all of the bad an evil things this is doing let's try some basic performance tests. To run tests I used the following simple object (what this library was designed to be really fast with). I grabbed this object off someone's blog who was also playing with serialization and added the interface but can't seem to find the link of which one it was to give credit for saving me a good minutes worth of typing :).

     

    [Serializable]
    public class Customer : ICustomBinarySerializable {
         private String _lastname;
         private String _firstname;
         private String _address;
         private int _age;
         private int _code;
    
        public Customer()
        {
            
        }
        public Customer(String lastName, String firstName, String address, int age, int code)
        {
            _lastname = lastName;
            _firstname = firstName;
            _address = address;
            _age = age;
            _code = code;
        }
    
        public String LastName {
               get {return _lastname;}
               set {_lastname = value;}
         }
         public String FirstName
         {
               get {return _firstname;}
               set {_firstname = value;}
         }
         public String Address
         {
               get {return _address;}
               set {_address = value;}
         }
    
         public int Age
         {
               get {return _age;}
               set {_age = value;}
         }
    
         public int Code
         {
               get {return _code;}
               set {_code = value;}
         }
    
        public void WriteDataTo(BinaryWriter _Writer)
        {
            _Writer.Write((string)_lastname);
            _Writer.Write((string)_firstname);
            _Writer.Write((string)_address);
            _Writer.Write((Int32)_age);
            _Writer.Write((Int32)_code);
        }
    
        public void SetDataFrom(BinaryReader _Reader)
        {
            _lastname = _Reader.ReadString();
            _firstname = _Reader.ReadString();
            _address = _Reader.ReadString();
            _age = _Reader.ReadInt32();
            _code = _Reader.ReadInt32();
        }
    }

     

    Speed

    To test the speed of the serializer I chose to serialize / deserialize one of these objects 10,000,000 times to/from a MemoryStream.

    Test Time (lower is better)
    Serialize (Binary) 01:48.54
    Serialize (Custom) 00:06.73
    Deserialize (Binary) 2:01.29
    Deserialize (Custom) 0:08.55

    So on serializing the custom one is a whopping 1612% faster and on deserializing it is 1418% faster. That's not too bad as both are more than an order of magnitude.

     

    Size

    The other area I really wanted to optimize as it is common for us to have 40+ gb transaction files for a day (disk IO is expensive) is the size of each message. Because we are not writing the same kind of schema information that the binary formatter does we can also be quite a bit smaller than its output. For the object given the binaryformatter results in 232 bytes of output while the custom formatter results in 41. This message has quite a few string which add into the amount of serialized data (on our messages (about 40) we average about a 1/10 ratio between the two). Even so its still a 500% gain in storage space required. Don't let this fool you though there are some  ....

     

    Problems

    There are a number of problems with this type of strategy. It is imperative that you know about the tradeoffs involved with this code before using it. This was written for a niche situation and it may really hurt you if you aren't careful!

     

    Versioning

    There is no versioning information provided by default in the data. One could easily provide this in their custom serialization implementation but the formatter does not provide it by default for you.

     

    Endianess

    One of the interesting things here is dealing with the length. I have done this using a quite unsafe (but faster) solution.

        [StructLayout(LayoutKind.Explicit)]
        public struct IntToBytes
        {
            public IntToBytes(Int32 _value) { b0 = b1 = b2 = b3 = 0; i32 = _value; }
            public IntToBytes(byte _b0, byte _b1, byte _b2, byte _b3) {
                i32 = 0;
                b0 = _b0;
                b1 = _b1;
                b2 = _b2;
                b3 = _b3;
            }
            [FieldOffset(0)]
            public Int32 i32;
            [FieldOffset(0)]
            public byte b0;
            [FieldOffset(1)]
            public byte b1;
            [FieldOffset(2)]
            public byte b2;
            [FieldOffset(3)]
            public byte b3;
        }

    This has endian problems if you use it on multiple machines that have different endianess like say mono on a ppc vs clr on x86. One could easily get around this by just using BitConverter instead (or doing some binary arithmetic if you miss having real reasons for doing so :)). For us however most of these objects are being serialized between processes on the same machine so its not an issue for us.

     

    Copying of Data

    Another problem (read: decision) has to do with how the formatter deals with the stream itself internally. It copies data off the stream into an internal memory buffer, it does this so it can reuse the same binaryreader/writer every time. This makes it non-reentrant and forces the copy but in testing with many very small messages the copying of the data turned out to be faster than creating a new Reader/Writer to the original stream on every iteration. This may turn out different for you, I will leave it as an exercise for the reader to change this (I promise it won't take more than 5 minutes)

     

    Typing

    Its a lot of typing in your objects (we can work around this with some IL generation) but that's a whole other post isn't it.

     

    Anyways I hope people enjoy this and can find a niche place of their own to use such a strategy.

  • DevTeach

     

    Looks like I will be at DevTeach in Montreal this December. I will be doing 4 talks (so much to keep in my head at once). There are many other great people showing up that I really look forward to seeing.

     

     

    In the Agile Track I will do a talk

     

    "TDD in a DbC World"

    Design by Contract is slowly moving its way into the mainstream. Many wrongfully find Test Driven Development and Design by Contract to be in conflict with each other.

    This session will familiarize the audience with some some basic concepts of Design by Contract and the use of a theorem prover for the static checking of contracts. Discussion will then look in more depth at how we can maintain a Test First mentality in a Contract First world.

     

    This talk is the same as the one I am doing in the alt.net track at QCon

     

     

    The rest of the talks are all in the Architecture Track:

    Domain Driven Design Chalk Talk

    We as developers and designers face increasingly more difficult problem spaces. By creating models around these problems we can create better, more flexible, longer lasting, and further distilled solutions to these problems. Domain-Driven Design is a formalization of this process.

    This talk introduces many of the basic patterns in Domain-Driven Design but instead of focusing on the patterns themselves it focuses on the interactions and intentions of the patterns. In other words, we will talk about "entities" for about 30 seconds before we get down and dirty on some real life problems and handle the tough stuff like determining aggregate boundaries and the roles of application services.

    A novice should be able to take away something from this talk, but then again so should an expert.

     

    I am particularly looking forward to this one as I have done a few of these in the past but never actually "prepared" for one. After watching the video from alt.net this weekend there are definitely some places where having a list of what I want to talk about will come in handy.

     

    The Non-Functional Juggler

    This is not just me making failed attempts at keeping flaming knives in the air, although that would probably be more entertaining.

    Non-Functional specifications are at the core of any architecture. Learning to balance non-functional specifications with each other and align them with business needs is the most important skill an architect can possess.

    The presentation looks at some of the varying types of non-functional specifications, how they interact with each other, and how you as an architect can determine the level of success for your project by managing them.

     

    Command Query Separation

    Betrand Meyer introduced the concept of Command and Query separation to Design by Contract nearly 30 years ago. Command and Query separation need not only apply at a micro-level to our code but should be a key architectural theme in our systems.

    This presentation after defining Command and Query separation as a theme, looks at a few common architectures and how we can improve them through the strong use of separation.

  • TCP: Buffer Management

    So a long time ago I wrote some posts on buffer management in TCP servers (it might be worth going back and reading them as they explain why buffer management is important. There have been a few comments lately asking for more complete examples. Funny enough that's just what I have been working on lately. So here is the first of a set of code drops of some code that will be open sourced (consider it MIT/MSPL now) in its entirety (a nice little framework for writing scalable servers, TCP transports, etc). The project doesn't have an official name yet and is being run on our local svn so I will just upload a zip file for now.

     

    You can download the source here: http://codebetter.com/files/folders/codebetter_downloads/entry181822.aspx

     

    A quick run down of what is there.

    BufferManager.cs - The main Buffer Manager class

    BufferPool.cs - A class that abstracts a set of buffers to allow common operations

    BufferPoolStream.cs - An adapter to the stream interface for a BufferPool

     

    There are associated tests for these classes <> 75.

     

    I know this seems like a lackluster post, but the code is worth going through. And if anyone has a good name for this library, let me know.

  • Impedance Mismatch Reframing

     

    This is a reply to Stephen Fortes post Impedance Mismatch from a ways back. I would have posted about it sooner but I sadly just saw it today when a co-worker Stefan Moser linked it over to me. I know that this debate has become quite heated through the community and as such will refrain from personal attacks (such as those unfortunately experienced by Julia Lerman) and focus solely on the technical merits of the post.

     

    My first problem with ORMs in general is that they force you into a "objects first" box. Design your application and then click a button and magically all the data modeling and data access code will work itself out. This is wrong because it makes you very application centric and a lot of times a database model is going to support far more than your application.

     

    Well I wouldn't say that this is a problem with ORMs per se but a problem with some tools. Those who are using Domain Driven Design are certainly not using this methodology, one of the main reasons I like to tell people to use DDD is that they can design their data storage mechanisms in parallel to their domain model seeking an optimal solution to each. In other words we should be embracing the impedance mismatch and doing what is best on both sides. The paragraph then continues with

     

    In addition an SOA environment will also conflict with ORM.

     

    I do not necessarily agree with this in any way shape or form but am happy to leave it left open to "the many definitions of SOA". I think it can quite easily be done if you follow solid command query separation. Udi Dahan gives a nice discussion of this on his blog.

    Later in the article (I am jumping around a bit to keep my own post coherent)

     

    One of the biggest hassles I see with LINQ to SQL is the typical many-to-many problem. If I have a table of Ocean Liners, vessels,  and ports, I’ll typically have a relational linking table to connect the vessels and ports via a sailing. (Can you tell I am working with Ocean Freight at the moment?) The last thing I want at the object layer is three tables! (And then another table to look up the Ocean Liner that operates the vessel.) Unfortunately, this is what most tools give me. Actually I don't even want one table, I want to hook object functionality to underlying stored procedures. I really want a port object with a vessel collection that also contains the ocean liner information.

     

    The author discusses his experiences with Linq2Sql and then applies it to "what most other tools give me", this is an unfortunate fallacy or a lack of research on available tooling. Linq2Sql is not a real "mapper" nor is what the author referring to "mapping", it is simply an Active Record implementation that is not using self-serving objects. This is what happens when mappers stay too close to the relational structure, they suck in terms of domain language and structure.

    If we were however to use a real mapper (let's say the one those notorious mafia guys are using) a quite different scenario would exist; a domain that sounds almost exactly like what is described as being wanted. This paragraph is also key in showing that research has not been done into Domain Driven Design by the author, I would bet that Stephen and Eric could have some really interesting discussions at the Advisory Council as Eric uses this exact problem domain as a naive starting point for examples in about half of his book.

    A more serious problem is shown though in the authors propensity towards a relational bias when domain objects are called "tables". Why would anyone have a domain full of "tables"? These are behavioral objects. Unless this misunderstanding of what a domain model is is corrected the rest of what a domain model is or does will never make any sense.

    A further lack of understanding of Domain Driven Design is shown with the statement of..

     

    ORM is real good for CRUD and real bad at other things.

     

    Again I believe the author has become confused between ORM and Active Record for some reason. I would never under any circumstances recommend someone to use Domain Driven Design for a CRUD app as there are easier ways (like using Active Record). DDD is hard and often painful, it is costly up front and should only be used in domains that can justify its up front costs in maintainability.

     

    Although it may be surprising, it is my belief that the author is actually a Domain Driven Design aficionado but has just not yet realized it yet.

     

    I prefer to build the application's object model and the data model at about the same time, with a "whiteboarding" approach that outlines the flows of data and functionality across the business process and problem set.

     

    It is quite common in an "object first" perspective to be either doing database and code modeling either in small iterations or in parallel where a team of object experts focus on the domain model and the best way to model the data in order to support transactional behaviors while a team of database experts focus on how best to store the data given their own set of requirements. These types of sessions would in fact be prescribed in an agile team and the small "whiteboarding" sessions are absolutely prescribed by Domain Driver Design.

     

    Maybe it is the MBA talking but I tend to be "business and customers first" when I design a system. (Those of you that know me know that I have designed some very large and scalable systems in my day.)

     

    This is one of the core beliefs of Domain Driven Design, the primary example would be the creation of an Ubiquitous Language in order to ease communications between the "business and customers" and the team.

     

    What I am saying (and have been saying for a long time) is that we should accept, no, embrace the impedance mismatch!  While others are saying we should eradicate it, I say embrace it.

     

    Again we are back into agreement with Domain Driven Design. I like to look at Domain Driven Design as being an orthogonal architecture, my domain survives through anything that is moved around it as it is the core of my business and where the largest amount of my investment has gone...

     

     

    We come now to where the author is unfortunately not in line with DDD but perhaps can be moved. The only way that one can reach an orthogonal architecture is to ensure the purity of the domain model. The OLTP RDBMS will eventually leave in popularity, what happens when I want to move to say "the cloud" and just store my aggregate roots as XML, this is a perfectly valid and extremely effective architecture. If I favor too heavily the RDBMS side of the impedance mismatch then this change will not be orthogonal to my domain and will as such be extremely costly. The author may disagree with my reasoning as he points out.

     

    ORM tools should evolve to get closer to the database, not further away.

    and

    Developers who write object oriented and procedural code like C# and Java have trouble learning the set-based mathematics theory that govern the SQL language. Developers are just plain old lazy and don't want to code SQL since it is too "hard." That is why you see bad T-SQL: developers try to solve it their way, not in a set-based way.

    and

    So ORMs are trying to solve the issue of data access in a way that C# and VB developers can understand: objects, procedural, etc.  That is why they are doomed to fail. The further you abstract the developer from thinking in a set-based way and have them write in a procedural way and have the computer (ORM) convert it to a set-based way, the worse we will be off over time.

     

    Well I think I have already discussed the first of these points pretty well, by moving closer to the database we break our hopes of an orthogonal architecture. The second comment albeit sounding like it came from a grand and mighty sql wizard sent down by the gods to lift us heathen from our sinful ways is actually a red herring as is the third when framed properly.

    I do know relational algebra (yes I can tell you what an anti-join is) and I challenge anyone to show me notation for an insert. While one could argue it can be involved with say a delete by PK/FK or update by PK it is for all intensive purposes useless in the process of writing to a properly normalized database, these items tend to be procedural regardless. I will admit there are times where it can come in handy but they are by far the minority. The relational algebra is focused on reading data and manipulating sets.

    As many who have had long post-conference talks over beer with me know I find any query that is of any amount of complexity close to thinking about the relational algebra to be a report. Reports are not expressed within my domain and may or may not be read from the same data source (I often times use an eventually consistent reporting model specifically for the purpose of running such queries). I take this often to extremes, my repositories in an ideal world have a single read method, FetchAggregateByUniqueId. Anything that is searching in a more complex nature is deemed a report and sits outside of this (usually as a small mapper that returns DTOs that match screen shapes, not domain shapes but provide the appropriate aggregate ids for writes to be possible). My "reports" all make very strong use of SQL and Relational Algebra, my domain has no need to know that it exists as it is essentially a write only model. I could go much more into this but it is another post.

    Getting back to the article, the author does however end off with a great quote from Ted Neward:

     

    "Developers [should] simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access (such as "raw" JDBC or ADO.NET) to carry them past those areas where an O/R-M would create problems."

     

    This is great advice ... just remember if you do it to hide it from your domain and to use it sparingly as you may not always have a RDBMS sitting behind you and if you don't these set based operations may be quite difficult to implement.

  • Bellware Driven Design

    When I was down in Seattle last week Scott Bellware did a talk about BDD for a few people. Its not a very formal talk (which I prefer) and its a bit slow to start but there are some gems in here. My camera died after the first hour but definitely worth checking out.

     

    Enjoy!

     

    Posted Jul 19 2008, 08:19 PM by Greg with 12 comment(s)
    Filed under:
  • Alt.Net Canada

     

    canada So its actually going to happen! August 15-17 in Calgary. Registration is now open! http://www.altnetconfcanada.com/

     

    Registration is now open to the first hundred people so forget your Canada Day celebrations and sign on up!

     

     

    btw: for those from the states, yes people in Calgary live in igloos, if you have never stayed in an igloo I would highly recommend the experience.

  • DDDD Moved

    After some conversations with Scott and others I have decided that I will be writing up alot more on DDDD (I have already started). I will be releasing it under a creative commons license as opposed to going with a brick and mortar publisher. Over the next few weeks I will begin pushing stuff out to a small group for review. The completed work will be available for download on my blog. There will also be atleast 1 reference application provided.

    My reasoning for releasing under creative commons is I want to get this out to as many people as possible. I may offer one of those "get a printed copy of this" or something but am more so focused on trying to get the ideas to market quickly (my estimate with a brick and mortar publisher was almost 2 years ... I think I can do it quicker otherwise).

    I am looking for a small (<10 group of reviewers) to read through things and provide feedback as they are written over the next few months. Drop me an email offline.

    Also as I am new to this if anyone has suggestions (as an example I was going to setup JIRA for reviewer comments) please let me know, or if you are a professional editor who wants to help my atrocious grammar that would be appreciated too :-)

    Posted Jun 20 2008, 07:10 PM by Greg with 8 comment(s)
    Filed under: ,
  • Dynamic Languages vs Static Verification

    At alt.net Seattle as some may remember I was doing a bunch of interviews for infoq.com. On of those quick videos was a talk with Rustan Leino, Mike Barnett, John Lam, and Matt Podwysocki about dynamic languages and static verification. This came from the starting fish bowl on polygot programming. I had to cut it a bit short in terms of time because John had to go but there are some interesting thoughts brought out (in particular the annealing of software over time).  Anyways ... here is the video, enjoy!

     

     

  • devTeach Talk

    Here is my devTeach talk ... I got a bit of a late start and had a lot of material to try to get in so I had to push away from a few good discussions but I will answer those discussions in a post here ... and be kind I only knew I was speaking 2 weeks in advance ;-)

     

    Enjoy!

     

  • EF Long Term Plans

    I was reading through what is actually a reasonable comparison of EF to other technologies on Dan Simmons' blog.

    Dave, Jeremy, and Jimmy have already discussed many issues but ...

     

    One bit caught my attention:

    Long-term we are working to build EDM awareness into a variety of other Microsoft products so that if you have an Entity Data Model, you should be able to automatically create REST-oriented web services over that model (ADO.Net Data Services aka Astoria), write reports against that model (Reporting Services), synchronize data between a server and an offline client store where the data is moved atomically as entities even if those entities draw from multiple database tables on the server, create workflows from entity-aware building blocks, etc. etc.  Not only does this increase the value of the data model by allowing it to be reused for many parts of your overall solution, but it also allows us to invest more heavily in common tools which will streamline the development process, make developer learning apply to more scenarios, etc.  So the differentiator is not that the EF supports more flexible mapping than nHibernate or something like that, it's that the EF is not just an ORM--it's the first step in a much larger vision of an entity-aware data platform.

     

    DDDD is something very similar to this but I think they have completely missed the boat. I have a single slide in my deck from devTeach that summarizes my objections quite succinctly.

     

    DDDD

     

    I have since rewritten this slide to be more generic in "A single model cannot possibly be appropriate for all facets of your application including transactional behaviors, searching, and reporting"

     

    In DDDD I deal with this by recognizing that the Entity is of limited importance and should be different in different places ... It is what happens to the entity that REALLY matters and it is the recognition and the making explicit of EVENTS in the domain that allows you to easily support multiple concurrent parallel models. These events should not be automatically generated object->field changed messages but should be DOMAIN CONCEPTS.

     

    let me say for the 1000th time. If you are reporting off your transactional model you are seeking trouble!

    On the DDD list people often ask "How do I use my domain to report" ... the answer "You don't" they are different models with different goals. It pains me that MS intends to push people into what is an anti-pattern, even for small systems.

     

     

    Jimmy Bogard was also right on the money when he mentions that I should not expose my model outside of my Bounded Context. I highly doubt a system like EF and what they suggest would work beyond trivial cases and is (as proposed) one small step up from using sprocs and linked servers as your integration model.

     

    I could say MUCH more about this but instead I will try to rework my talk a bit in Victoria Wednesday to try to include some of this.

More Posts Next page »

Our Sponsors

Proudly Partnered With