CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]

  • Using NDepend on large project, a success story

     

    Sébastien Andreo just wrote a blog post explaining how using some NDepend features made the life of its team easier. Sébastien is a software architect at Siemens Healthcare in Germany and its team uses NDepend for over 18 months. The team is massive with more than 100 developers located on several sites and the project is around 7 years old. With around 2.5M Lines of Code, the code base is very large. For example, the entire .NET Framework v3.5 (including ASP.NET, WindowsForm, WPF, WCF...) weights 8.5M IL instructions while these 2.5M Lines of Code compiles to 13M IL instructions (+ some C++ code that compiles to x86).

    Sébastien explains that thanks to the fine impact estimation made possible with Code Query Language (CQL) the team can now rationalize deliveries and avoid breaking accidentaly the build. Making a clear estimation of changes consequences enhances the communication between co-workers and leverages the team anticipation skill. With NDepend, the team can precisely estimate which part of the code needs more attention (typically brand new code, complex code, entangled code, refactored code...). Thus they can increase the value of their smoke tests and focus automatic tests effort.

    Also Sébastien confirms that all our work on performances is relevant. In a context of such a large code base the analysis duration dropped from 25 minutes to 6 minutes (from a Clear Case virtual file system which has a negative impact on performance). On smaller code base the analysis duration is a matter of a few seconds. In 2009 we plan more performance improvements because the time of users is precious.

    Sébastien ends up its post with a wish list and I take a chance here to answer it:

    • Plug-in mechanism  for Runtime dependency check (private, spring.Net, …): This is planned in the mid-term, and I will write some blog posts soon to explain our stance on the subject.
    • Multi language support (Java seems to be on track with  www.xdepend.com): Yes XDepend (NDepend for Java), will see the light of the day within the next months. A public beta will be available within the next weeks.
    • CQL complex Query support (multi select in one query): Yes query composition will be delivered in 2009. However, we can hardly provide a clear date by now.
    • In the visualNDepend class browser the possibility to apply some filter, (a CQL filter perhaps). If you have a lot of assemblies you can easily find the targeted one without scrolling through the whole list. Yes, we are currently working on increasing the CQL results panel usability to add even more flexibility to VisualNDepend.
    • Plug-in mechanism for the NDepend project. You are currently supporting VisualStudio but what’s about MonoDevelop, SharpDevelop eclipse or others… I am not sure that we will support others IDE than Visual Studio in the future. But more integration is part of our plan for 2009 and some integration APIs will be made public for those who want to integrate some NDepend capabilities into some environments.


     


  • What is Microsoft waiting for providing a decent path API?

     
    I was recently browsing the code source of Managed Extensibility Framework and realized that this future part of .NET 4, full of tricky and advanced ideas, was naively relying on strings to describe files and directories paths. It seems that the version 4 of .NET will miss the need for a decent path API. There is the class System.IO.Path but it is feature limited, full of flaws and pitfalls (to not say bugs) and it fosters users to encode their paths into raw strings. It seems to me that using strings to encode paths is as primitive as using String.IndexOf("<tag>") to parse some XML.

     

    As many other applications, the tool NDepend needs to perform many complex paths operations. This includes:

    • Relative / absolute path conversion + Path rebasing.
    • Path normalization API
    • Path validity check API
    • Path comparison API
    • List of path operations (TryGetCommonRootDirectory, GetListOfUniqueDirsAndUniqueFileNames, list equality…)

    In mid 2007, during the development of NDepend we came to the point where relying on the naïve and flawed System.IO.Path was just not acceptable anymore. We then invested a few weeks in building our own strong-typed path library NDepend.Helpers.FileDirectoryPath, that we released Open-Source on CodePlex. The library is bug-free (at least there is no known bug) 100% covered by tests but still, it didn’t attract the mass (only 477 downloads in 14 months). Moreover I don’t know any other equivalent library (do you know any?).

     

    The conclusion is that in 2009 and beyond, doing something as mainstream as handling paths operations in .NET, will still be a pain!

     

     

     

    This reminds me the C# non-nullable types debate where half of the bugs in .NET code still comes from the pesky NullReferenceException. IMHO concerning this issue, the future .NET 4 contracts API will help a bit, but certainly not as much as a proper non-nullable strong typing integration at language and platform level. Sadly, Anders Hejlsberg recognized that the absence of non-nullable types is their biggest mistake in designing C# and .NET and cut any hope for seeing them one day.

     

        A.H: You sort of end up going, well ok, if we ever get another chance in umpteen years to build a new platform, we’ll definitely get this one (i.e non-nullable types) right.

     

    The problem is that both Microsoft and its partners have now capitalized for almost a decade in .NET and won’t likely shift to another platform before a very long time. Maybe I am narrow-minded but I really can’t think of the post .NET era.

    • The last years shown that investment in research and innovation from the Java world cannot compare to what MS is doing. Since LINQ in 2007 and even .NET generics in 2005, Java doesn't lead the way anymore in technical innovation and they won’t likely be able to change this trend.
    • Microsoft invests massively into .NET for all its own products and it looks like it is just the beginning. I can’t think of Microsoft replacing .NET with something else soon!
    • And except the Java community or Microsoft, who else is able today to produce a mainstream platform?

    Debug.Assert(… != null); or if you prefer CodeContract.Requires( … != null) will continue to be our friends for a very, very long time.

     

     

  • Ship It Often vs. TDD

     
    Ayende recently answered the question what the bare minimum aspects of Agile project would be? His opinion is: Ship it Often.

     

    Without surprise, most of developer would instead have answered with TDD. Personally, I cannot answer this, as I cannot answer if in a good meal the main course is more important than the dessert.


     

    I wrote about this subject a year ago in the article How to avoid regression bugs while adding new features? Basically, I wanted to underline with this article the empirical facts that:

    • Code that works in production will likely continue working properly in the future as long as it is not touched.
    • The vast majority of new defects introduced by a new release come from refactored and added code since the last release.
    • As a consequence, before releasing the team should focus its attention on refactored and added code since the last release (i.e the diff).

    This indirectly advocates for the principle of small iterations and ship it often. The shortest is the iteration, the smallest is the diff on which to focus on. This is one of the top goal for the development of NDepend. On the release notes page, we can see that we did 11 major releases and 19 minor releases during the last 24 months.

     

     

    By focusing our attention on diff I concretely mean:

    • Code review diff.
    • Smoke test features corresponding to diff in code.
    • Write unit tests to ensure high coverage on diff.

    This is just common sense! At this point, the need for high tests coverage is joining the need for short iterations. Because when iterations are short, the amount of fresh code to test is relatively small and this is the best motivation to actually write tests.

     

     

    NDepend can help both code reviewing diff, and also ensure high coverage on diff. For code reviewing diff you can use the build comparison feature coupled with the ability to plug a source code diff tool to NDepend:

     

     

    For ensuring high coverage on diff you can mix the build comparison feature with the import of test coverage metrics. I wrote a full blog post on this, Are you sure added and refactored code is covered by tests? Basically the idea is to use the following CQL query to ask for changes not properly covered by tests:

     

    SELECT METHODS WHERE            // Select methods where

      PercentageCoverage < 100 AND  //  not 100% covered by tests and

      (CodeWasChanged OR            //  was refactored or

       WasAdded)                    //  was added.

     

     

     

  • Increase Build Process added-value with Static Analysis

     

     

    The tool NDepend gathers data from a code base. This includes quality code metrics, test coverage statistics, componentization/architecture/dependencies, evolution and changes, state mutability, usage of tier code and much more. The amount of data produces is proportional to the size of the code base and can become pretty big in case of a large application analyzed. The NDepend added value lies in its capabilities to let user browse readily this huge amount of data. This way developers and architects can know precisely what’s really happening inside their shop and can take decisions based on real facts, not based on intuition and rumor. There are 2 distinct scenarios to browse data:

     

    • Through a report: NDepend analysis process can be integrated into a build process and can produce a customizable HTML report each time the analysis is run. The report is suited to produce daily dash-boards useful for every members of the team, even non-technical ones.
    • Through the stand-alone UI VisualNDepend: The VisualNDepend UI comes with several panels to visualize and query interactively information about the code base. This UI collaborates hand-in-hand with Visual Studio and Reflector. VisualNDepend users are architects and developers that need to dig into details of the code base at any time.

     

    I would like to expose here some details about how to integrate NDepend into a build process but let's first explain how NDepend can provide useful warnings about the health of a build process.

     

    NDepend warnings about the health of the build process

     

    These warnings can be found in the file $AnalysisResultDir$\InfoWarnings.xml, in the report section NDepend information and warnings and in the Error List panel of the VisualNDepend UI.

     


     

     

     

    What we mean by the health of the build process is some details that can reveal potential flaws. Concretely this includes:

     

    Assemblies versionning issues such as:

    • AssemblyA references AssemblyB v2.1 but only AssemblyB v2.0 is available.
    • AssemblyA references 2 versions of AssemblyB (which is not necessarily a bad thing, but it's still useful to be aware of such situation).

     

    Assembly conflicts such as:

    • The name of my assembly main module file is different from the logical name of my assembly.
    • Several different assemblies with the same name can be found (different = different hash code and/or different version).

     

    PDB files issues such as:

    • Missing PDB files.
    • PDB files and code source files not in-sync.
    • PDB files and assemblies are not in-sync.

     

    Coverage files issues such as:

    • Corrupted or missing coverage files.
    • Coverage files and code source files not in-sync.
    • Coverage files and assemblies are not in-sync.


    In the Error List panel of VisualNDepend you have the possibility to deactivate false warnings to avoid being warned again and again during future analysis.

     

     

     

     

    Running an Analysis with NDepend.Console.exe

     

    NDepend comes with 2 executables that reflect the duality report/UI scenarios mentioned above: NDepend.Console.exe and VisualNDepend.exe. NDepend.Console.exe can run an analysis. It is a classic console executable that takes command line arguments. The minimal input is an absolute path to the NDepend project file that defines the code base to be analyzed. Several command line arguments can be provided and they are listed here: http://www.ndepend.com/NDependConsole.aspx. Basically these arguments will let you override folders where assemblies to be analyzed are stored, override the output folder where data produced by the analysis will be persisted and provide a XSL sheet to customize the report.

     

    A simple exec command is needed to integrate NDepend.Console.exe into your build process. The NDepend redistributable also comes with a NAnt task and a MSBuild task.

     

    More information about integrating NDepend into a CruiseControl.NET build are provided in the CruiseControl.NET documentation here .

    Also Laurent Kempé (MVP) took the time to detail NDepend integration into a TeamCity build process here (merci Laurent :o)

     

     

    Analysis Option

     

    To handle real-world scenarios, there are several analysis options. Options can be tuned through the VisualNDepend > Project Properties panel (Ctrl+Shift+P). Options are then persisted into the NDepend project file and can be harnessed at analysis time.

     

    The first option is the ability to choose between absolute and relative paths to folders where analyzed assemblies are stored. Analyzed assemblies include both your own assemblies, but also tier assemblies that are used by your application (like mscorlib.dll or Log4Net.dll). If you choose the option value relative path, paths are relative to the folder where the NDepend project is stored. This option is useful when the NDepend analysis is performed on several machines (build servers, developer machines…) where the root folder of the whole development shop can vary.

     

     

     

    Notice that the folders where the .NET Framework assemblies are stored are not impacted by relative path option. NDepend automatically adapts folders path from the current machine .NET framework installation.

     

     

     

    In the VisualNDepend > Project Properties > Analysis sub-panel, you’ll find 3 interesting options beside the project name and output folder.

     


     

    The Analysis Comparison option lets define the previous analysis result on which to compare the current analysis performed. This is useful if you’ve defined some CQL rules about evolution of your code base like for example, like making sure that all new or refactored methods are 100% covered by tests:

    WARN IF Count > 0 IN SELECT METHODS WHERE

    (WasAdded OR CodeWasChanged) AND PercentageCoverage < 100

     

    Basically, here WasAdded Or CodeWasChanged means was added or refactored compare to the last analysis result. The last analysis result to compare with can be a particular result (like the analysis of the last release we’ve made), a result made N days ago or the last analysis result available. Below, I'll give details about detecting  CQL result violations in the report.

     

     

     

    The Code Coverage option lets specify the NCover or VisualStudio coverage file(s) from which NDepend will gather test coverage statistics. More detail about how to obtain these coverage files can be found in the NDepend documentation here: http://www.ndepend.com/Coverage.aspx

     

     

    The Source File Rebasing option is useful if the code compilation and the NDepend analysis are executed on different machine. NDepend gathers information from the assemblies, but also from the source files if they are available. NDepend knows about source files from the PDB files produces by the compilation. PDB files contain absolute paths to source files. PDB files are initially used by the debugger to link IL compiled code with source files code. If the code compilation and the NDepend analysis are executed on 2 different machines, it might happen that source files locations are different between the 2 machines. In this case absolute source files paths contained in the PDB files should be rebased, hence the need for the Source File Rebasing option. More information about source files rebasing can be found here.

     

     

    Report customization

     

    Report can be customized through the VisualNDepend > Project Properties > Report sub-panel. You can choose to activate and re-order 11 pre-defined sections like Application and Assemblies metrics, some diagrams (dependencies, metrics, abstractness vs.instability…)  and more.

     


     

    It is recommended to de-activate the Type Metrics and Type Dependencies sections since they can become pretty large if you have more than a thousand types in your application. Typically, browsing type metrics and dependencies is a scenario better addressed through the VisualNDepend.exe UI.

     

     

    The section CQL Queries and Constraints is especially useful to detected CQL rules violation. For each CQL rule you can choose whether you want to display violation in report (Active tickbox) . In case of violation you can choose if you want to display list of items selected in report (be careful, the list can be pretty long, hence the usage of SELECT TOP 10 in pre-defined rules), statistics about this list of items, and a selection view of items selected:

     

     

     

    Here is what a CQL rule violated looks like in report:

     

     

     

     

     

     

    For most users the classic report provides enough information and when needed to dig into more details, the VisualNDepend UI comes to the rescue. But still, you can choose to build your own report by providing your own XSL sheet. It can be inspired from the one in $NDepend Install Dir$/CruiseControl.NET/ ndependreport-ccnet.v2.xsl. Input information can be taken from the following xml files outputted by the analysis in the $AnalysisResultDir$ folder: TypesMetrics.xml, TypeDependencies.xml, InfoWarnings.xml (see below for more information on Info/Warnings), AssembliesDependencies.xml, AssembliesMetrics.xml, ApplicationMetrics.xml, AssembliesBuildOrder.xml (provide a build order for your assemblies if there are no dependencies cycles). The tickbox VisualNDepend > Project Properties > Analysis > Keep XML files used to build reports and store warnings must be ticked, else these XML files get deleted by NDepend after the report is built.

     

    In the output folder you’ll notice also a file named CQLResults.xml that contains information used to display CQL rules violation in the report. You might want to harness the XML content of this file for getting automatically some results from some CQL rules.

     

    Finally, I would like to mention the under the hood possibility to get 100% of the information used internally by VisualNDepend. By providing the command line argument /EmitVisualNDependBinXml to NDepend.Console.exe the file $AnalysisResultDir$\VisualNDepend.bin.xml will be built by the analysis. However there are no guaranties that the XML schema of this file will remain compatible and it will likely evolve in the future.

     

     

  • Advices on partitioning code through .NET assemblies

    The tenet is: reduce the number of your .NET assemblies to the strict minimum. Having a single assembly is the ideal number. This is for example the case for Reflector  or NHibernate that both come as a single assembly.

     

    A lot have been said on this topic. I dig in this article why using more namespaces and less assemblies for componentization is a good thing. Jeremy Miller does it so in this blog post. The point is that assemblies are physical while namespaces are logical. As a consequence, having N assemblies multiplies by N the burden of dealing with physical things. This burden consists of referencing N assemblies from a Visual Studio project, slowing down significantly compilation of N Visual Studio projects, managing the deployment for N files, need for the CLR to do N CAS security checks at startup time, etc… Personally I have seen applications with up to 750 assemblies! Quote from Jeremy Miller: I took 56 projects one time and consolidated them down to 10-12 and cut the compile time from 4 minutes to 20 seconds with the same LOC

     

    I would like to discuss here the motivations behind creating an assembly. I will then illustrate these reasons through the choices we did for assemblies of the NDepend code base. In an effort to reduce the number of assemblies in your shop, it is a good thing to find a valid reason for the existence of each assembly of your code base. If no solid reasons can be found you’ll find room for merging assemblies.

     

    Valid and Invalid reasons to create an assembly

     

    Valid reasons to create an assembly:

    • Tier separation: Need to run some different pieces of code in different AppDomain or process. The idea is to avoid overwhelming the precious Window process memory with large pieces of code not needed.
    • Potential for loading large pieces of code on-demand. This is an optimization made by the CLR: assemblies are loaded on-demand. In other words, the CLR loads an assembly only when a type or a resource contained in it is needed for the first time. Here also you don’t want to overwhelm your Window process memory with large pieces of code not needed most of the time.
    • Framework features separation. In case of very large framework, users shouldn’t be forced to embed every features into their deployment package. For example, most of the time an ASP.NET process doesn’t do some Window Forms and vice-versa, hence the need for 2 assemblies System.Web.dll and System.Window.Forms.dll. This is valid only for large framework with assemblies sized in MB. For example the NUnit http://www.NUnit.com framework certainly does not need 25 assemblies for 15.000 Lines of Code! Quote from Jeremy Miller: Nothing is more irritating to me than using 3rd party toolkits that force you to reference a dozen different assemblies just to do one simple thing.
    • AddIn/PlugIn model, need for interface/factory/implementation physical separation.
    • Test/application code separation. If you are not releasing source code but only assemblies, you likely don't want to release tests. Not releasing test assemblies make this easy.
    • When several assemblies have been created for the reasons above, they likely need to share some common code. Such shared code must be placed in a dedicated shared assembly.

    We could add assembly as a unit of versioning but IMHO, the need for versioning is a subset of all these reasons enumerated above.

     

    Invalid reasons to create an assembly:

    • Assembly as unit of development, or as a unit of test. Modern Source Control Systems make it easy for several developers to work simultaneously on the same assembly (i.e the same Visual Studio project). The unit should be here the source file.
    • Automatic detection of dependency cycles between assemblies by MSBuild and Visual Studio. There are tools such as NDepend that can detect dependency cycles between namespaces or types of an assembly.
    • Usage of internal visibility to hide implementations details. This public/internal visibility level is useful when developing a framework where you want to hide implementation details to the rest of the world. Your team is not the rest of the world, so you don’t need to create some assemblies especially to hide some implementations details.
    • Usage of internal visibility to prevent usage from the rest of the application. If you want to prevent usage and thus control the structure/dependencies of your code base, you should better use some dedicated tools such as NDepend .

    If you are interested in reducing the number of your Visual Studio projects/assemblies, I would suggest reading this blog post Hints on how to componentized existing code. It shows how to use some NDepend dependencies features to get some hints about which sets of assemblies should be merged.

     

     

    A case study

     

    NDepend code base is split across 11 assemblies and here is the dependency diagram of NDepend assemblies (made by NDepend itself):

     

     
     

     

    Something you might notice is the XDepend term. We are currently building the XDepend product (aka NDepend for Java), that will be released in 2009. More information are available on the official website http://www.XDepend.com and I will talk more about this in the future. This XDepend/NDepend duality leads to 2 different deployment packages for the 2 products and this is why there are NDepend.Console.exe, VisualNDepend.exe, NDepend.Platform.DotNet.dll on one hand and XDepend.Console.exe, VisualXDepend.exe, XDepend.Platform.Java.dll on the other hand.

     

    The need for 4 exe instead of 2 was needed mostly for terminology. XDepend.Console.exe certainly makes more sense for an XDepend user than NDepend.Console.exe. All these 4 executables are almost empty in terms of code. The console ones initialize the platform (Java or .NET) and start the analysis implemented in NDepend.Analysis.dll while the Visual ones initialize the platform and start the UI, implemented in NDepend.UI.dll.

     

    The need to initialize the platform (.NET or Java) is a motivation for isolating the code specific to each platform. Hence the need for the 2 assemblies NDepend.Platform.DotNet.dll and XDepend.Platform.Java.dll. This separation will also make easy potential future platform support (C++, Delphi…).

     

    The tool comes with 2 primary usages, analyze code and digging into analysis results through the UI. This is the motivation for having 2 different executable assemblies: NDepend.Console.exe and VisualNDepend.exe on one hand, and 2 different libraries, NDepend.Analysis.dll and NDepend.UI.dll on the other hand. The idea is to avoid having both these assemblies loaded inside the same process.

     

    NDepend.Analysis.dll and NDepend.UI.dll both rely on a lot of common code, core domain objects + many helper/util code, hence the need for creating the NDepend.Framework.dll assembly.

     

    NDepend.CQL.dll is a lightweight assembly (less than 10KB) that contains the plumbing for declaring CQL rules directly inside the source code. Not only this assembly NDepend.CQL.dll is used by NDepend users who harness this possibility, but also we use it extensively to declare our own constraints in our code. For users who which to declare CQL constraints inside their source code, it is more convenient to link with a single and lightweight assembly, hence the need for NDepend.CQL.dll.

     

    Finally, the NDepend.AddIn.dll assembly contains the plumbing needed to register the NDepend VisualStudio and Reflector addin. This assembly references big assemblies such as the Reflector.exe assembly. In order to prevent loading by mistake Reflector.exe in one of the NDepend process, it was a good thing to create a dedicated NDepend.AddIn.dll assembly. Also, addin assemblies are registered in VisualStudio and Reflector through their names and the name NDepend.AddIn is well-suited.

     

     

     

  • Solid State Drive: Enhance developers' productivity

     
    I just got a new laptop with Solid State Drive and here are some benchmark results against my desktop which is quite a massive machine. See the results for some of frequent developer activities ; they are quite instructive:

     

    Laptop: Dell Latitude E4300, Intel Core Duo SP9400 2.4GHz 32bits, 4GB RAM, 128GB SSD with Windows XP.

     

    Desktop: ASUS Intel Quad Q6600 2.4 GHz 64 bits, 16GB RAM, 465GB RAID 7200 RPM HD with Windows Vista Ultimate.

     
    Note that I never had a 10.000 RPM HD in hand and don't know how SSD and 10.000 RPM HD compare.

      

    • Full Build Process of community and professional editions of NDepend 2.11 with obfuscation with dotfuscator (but without automatic tests):   Laptop: 2:53  Desktop:5:07

     

    • Compilation of NDepend Professional Debug : Laptop: 7s  Desktop:11s

     

    • NDepend analyzing its own code base:  Laptop: 8s   Desktop:13s

     

    • Starting VisualStudio 2008 SP1 and Resharper 4.1 on the main NDepend solution: Laptop: 9s   Desktop:11s

     

    • Run of 1846 NUnit tests: Laptop: 9.5s   Desktop:26s 

     

    • Run of 1846 NUnit tests with NCover 2.1   : Laptop: 5:39   Desktop:4 :05  (I did the test 2 times, the only result where the desktop is faster, I don’t know why ?!) 

     

    • Uncompress a rar archive of 149MB (7245 files) with winrar:   Laptop: 37s   Desktop:2 :40

    • Duplication (Copy/Paste) of the149MB file:   Laptop: 4s   Desktop: 11s


    • Delete all the 7245 files  (727MB) :   Laptop: between one and two seconds   Desktop: 9s

     

    Also the SSD Laptop doesn’t do any noise (I cannot hear if it is on or off), doesn’t produce much warm and weights 1.5Kg (3.3 lb). IMHO SSD is an excellent way to enhance my productivity, as getting several monitors.


     



     

  • Lessons learned from a real-world focus on performance

     
    I am glad to announce that we just released a new version of NDepend where the analysis phase duration has been divided by 4 and the memory consumption has been divided by 2.

     

    An interesting question is: Does such a massive performance gain means that it was badly coded the first time? Previous slower versions met several thousands of functional requirement needed to analyze properly any .NET application. In this sense, it was nicely coded. But still, there were room for improvement and my believe is that there is always room for better performance. Even on the current much faster version we identified several significant optimizations to be done. The downside is that this will require a lot of work. So the first lesson learned is that there is always room for better performance. We can complete this rule with the fact that the amount of work to make a program run faster grows exponentially with the gain expected.

     

    We quantified each performance gain source and we obtained this chart. As said we have a 75% gain obtained thanks to algorithm optimization, micro-optimization and parallelization. The current analysis duration (25% of the original analysis duration) can be split between 17% of the time spent in our code, and 8% of the time spent in tier code. In our case, tier code is the code of Mono.Cecil, used to build some object models of the assemblies analyzed.

     

     

     

     

    The bulk of performance was obtained with algorithm improvement and this is not a surprise. IMHO the surprise comes from the relatively small gain obtained from parallelization. Indeed, we did some massive refactoring to parallelized the analysis with the parallel monologue idea in mind. Now each assembly is analyzed as a task that doesn’t have any synchronization need with any other task. In other words each task uses its own states that are not visible from the other tasks. Unfortunately this requirement is not enough to ensure proper scaling on the number of processor. While we get the 15% gain from between 1 and 2 processors, the gain is almost zero between 2 and 4 processors. We identified some potential IO contentions and memory issues that will require more attention in the future. This leads to another lesson: Don’t expect that scaling on many processors will be something easy, even if you don’t share states and don’t use synchronization.

     

    When it comes to enhancing performance there is only one way to do things right: measure and focus your work on the part of the code that really takes the bulk of time, not the one that you think takes the bulk of time. I have a quick real-world illustration. Personally I have always been amazed by the performance of the C# and VB.NET compilers. Imagine, it can compile in a few seconds thousands of source files that took dozens or even hundreds of man-year to be written. At analysis time, NDepend needs also to parse sources files but it does it partially to only obtain comment and Cyclomatic Complexity  metrics. By measuring file loading time we had a good surprise: loading source file in memory is much cheaper than expected. For example, to load the 1728 C# sources files of NDepend it takes less than 0.2 seconds on the 13 seconds needed for a complete analysis of its own code. Now, knowing that it is almost free to load the source files in memory I am a bit less impressed by the performance of the C# and VB.NET compilers. And it leads to an important lesson: NEVER anticipate the performance cost, ALWAYS measure it. And the only professional way to measure is to rely on performance profiler tools. Actually, we use both dotTrace from JetBrain and Red-Gate ANTS Profiler and are happy with it.

     

    Something important concerning the measure of code performance is the percentage of time spent inside tier code, things like DB or network access. As we explained in the previous chart, in our case study 35% of the analysis time is spent inside the Cecil code. We can infer from this number that if we want to have an analysis duration divided by 2 in the future, our code will need to run 4.3 faster! ((100 – 35) / (50 -35) = 4,3). This clearly shows the importance of another lesson: assess your limits by measuring the percentage of time spent in tier code.

     

    In the dissection of performance gain, an interesting point is the 15% gain obtained from micro-optimization. I mean things like using the right kind of loop, using the right kind of collection, preferring to deal with primitive CLR objects as int and string (POCO idea), choosing carefully between structures and classes, buffering properties getter results into local variables or even exposing non-private fields. I could summarized the lesson learned by the fact that micro-optimizations are worth it while premature optimizations are the root of evil (as Donald Knuth said a long time ago). The difference between micro-optimizations and premature optimizations is that the first ones are driven by measurement while the second ones are driven by guesses, intuitions and hunches.


    Something not quantified in our performance gain is the fact that memory consumption has been roughly divided by 2. Less memory doesn’t necessarily means less managed objects but in our particular case we do allocate less objects. The nice consequence is that less objects means less time spent in the GC. Less memory also means less virtual memory page fault, which is IMHO the worst thing when it comes to polish performances of a program.

     

    Finally, we are now experiencing a great advantage of optimization: automatic tests naturally ran much faster and it is a good motivation to run them more often!

     

     

     

    I end up with an illustration of the tenet there is always room for better performance: the Amiga demo scene in which I participated in the early nineties. Every demos ran on a constant hardware. It was the perfect incentive for demo developers to produce code more and more optimized. During several years, every months some record were beaten, like the number of 3D polygons, the number of sprites or the number of dots displayed simultaneously at the rate of 50 frames per seconds. As far as I can estimate, the performance factor obtained in a few years was something around x50! Imagine what it means, running in one second a computation that took initially a minute. And as far as I know this massive gain was the result of both better algorithms (with many pre-computations and delegations to sub-chip) and micro-optimizations at assembly language level (better use of the chip registers, better use of the set of instructions...).

     

     

  • Composing Code Metrics Values

    NDepend provides 82 different code metrics which are all explained on this page. Several software development topics are addressed by these metrics, like:

     

    Metrics constitute one feature of NDepend amongst several others features like comparing code base snapshots, digging in the structure of a code base through visual artifact like matrix and graph, or defining active rules.

     

     

    I would like to focus here on some tricks to get the most of NDepend code metrics. Sometime the value of a metric is an immediate indicator to detect code flaws. Methods with more than 6 parameters and classes with a depth of inheritance greater than 8 are obviously flaw.

     

    But some other times, the value of a code metric might be meaningless outside of some contextual information. For example, a class 100% covered by tests is certainly a good thing. But this information is more relevant for a large class with 300 lines of code than for a mini-class with 10 lines of code.

     

    The same for a large method with 100 lines of code, this is a bad thing! But if the method has no cyclomatic complexity (meaning if the method has no loop, if, else, switch, try/catch…) this is not such a big deal since the method might still be easy to understand and to maintain. On the other hand a small method with 10 lines of code can become a real nightmare to maintain, especially if it is writing some fields.

     

    To handle properly such cases, we need to correlate different metrics and eventually some other properties of the code base. This is possible in NDepend thanks to the Code Query Language flexibility.

     

    For example you might want to know which classes are more than 95% covered by tests but not 100 covered yet. To avoid noise, you might wish to focus on larger classes. With CQL you can use an ORDER BY clause to sort classes by their number of lines of code and a TOP clause to limit the number of match.

     

     

     

    As you can see the query result automatically contains all metrics involved in the CQL query and it is up to you to append more metrics in the query. As a bonus you can see that for each metric displayed in the query result you'll get statistical information like average and standard deviation.

     



     

     

    Another example. Classes that use many other types (in other words, classes that have a high Efferent Coupling, Ce) are likely problematic. Indeed a high Ce often means too many responsibilities for a class. Pinpointing classes with high Ce is easy.

     

     

     

    But doing so, you will certainly match primarily UI classes, like forms and controls. Indeed, UI frameworks often foster classes with high Ce due to the need to have some kind of large mediator classes to makes the different UI components communicate. You might then want to discard forms and controls classes:

     

     

     

    And this last screenshot underlines a cool feature of CQL: the depth of usage metrics. You can see that each DeriveFrom XXX condition leads to a DepthOfDeriveFrom XXX metric. Obviously, here the values of this metric are N/A since we especially ask for classes that are not deriving from XXX. But you can use this ability to build depth of metrics usage on the fly to know for example where are your UI forms and sort the list by their DepthOfDeriveFrom Form class.

     

     

     

    The depth of usage trick is not limited to inheritance and you can use it on the object creation usage with the condition DepthOfCreateA, on the writing field usage with the condition DepthOfIsWritingField and more generally on any kind of usage with the condition DepthOfIsUsing and DepthOsIsUsedBy.

     

     

    As I explained previously on this blog, the depth of usage trick has several interesting applications such as getting all paths from a code element A to another code element B

     

     

    … or building some call graph or class diagram as explained here

     


     

  • An easy and efficient way to improve .NET code performances

     
    Currently, I am getting serious about measuring and improving performances of .NET code. I’ll post more about this topic within the next weeks. For today I would like to present an efficient optimization I found on something we all use massively: loops on collections. I talk here about the cost of the loop itself, i.e for and foreach instructions, not the cost of the code executed in the loop. I basically found that:

     

    • for loops on List<T> are a bit more than 2 times cheaper than foreach loops on List<T>.
    • Looping on array is around 2 times cheaper than looping on List<T>.
    • As a consequence, looping on array using for is 5 times cheaper than looping on List<T> using foreach (which I believe, is what we all do).

     

    The foreach syntax sugar is a blessing and for many complex loops it is not worth transforming foreach into for. But for certain kind of loops, like short loops made of one or two instructions nested in some other loops, these 5 times factor can become a huge bonus. And the good news is that the code of NDepend has plenty of these nested short loops because of algorithm such as Ranking, Level and dependencies transitive closure computation.

     

    Here is the code used to benchmark all this:

     

     

     
    Here are the results obtained with the System.Diagnostics.Stopwatch class:

     


     

    The fact that for is more efficient than foreach results from the fact that foreach is using an enumerator object behind the scene.

     

    The fact that iterating on array is cheaper than iterating on List<T> comes from the fact that the IL has instructions dedicated for array iteration. Here is the Reflector view of the methods:

     

     

     

    In the console output, we can see that foreach on array is a tiny bit more costly than for on array. Interestingly enough, I just checked that the C# compiler doesn’t use an enumerator object behind the scene for foreach on array.

     

    Some more interesting finding. The profiler dotTrace v3.1 gives some unrealistic results for this benchmark. IterateForeachOnList is deemed more than 40 times more costly than IterateForOnArray (instead of 5 times). I tested both RecordThreadTime and RecordWallTime mode of the profiler:

     

     

     

    Thus I also ran the benchmark with ANTS Profiler v4.1 and got some more realistic results, still not perfect however (IterateForeachOnList is now deemed 8 times more costly than IterateForOnArray,instead of 5 times).

     

     

     

    You can download the code of the benchmark here. All tests have been done with release compilation mode.

  • 2 talks on Architecture and on Regression at Prio Conference

     I'll give 2 talks on monday 10 november and tuesday 11 november at PrioConference in Baden Baden, Germany. Here are the abstracts:

     

     

    Architecture and Dependencies  (Monday 15h30 - 16h45)

    Why do we architect our software? Why do concepts such as component, abstraction, cohesion, layering or IoC are nowadays so popular? In this session, we will address these questions through the prism of dependencies. With appropriate tooling, a code base architecture can be concretely seen as a graph of dependencies. Good architectural practices and patterns results in enforcing some simple structural properties on this graph. Amongst properties exposed, we will focus on directed acyclic graph and on the need to keep components at low-level.

    This session aims at being practical oriented. Several real-world code bases will be dissected with the tool NDepend to illustrate principles and explanations presented.



    Avoid architecture regression with active conventions  (Tuesday 11h45 13h00)

    What the last decades taught us is that the code source is the design. Documentation on the reasons why a particular architectural decision has been taken sits aside the code base. Quickly, such architectural documentation becomes obsolete because the cost of keeping it in-sync with the code base is too high. Intentions behind design decisions get lost and as a result, the implementation ends up violating the initial guidance. This phenomenon is known as architecture regression/design erosion.

    In this session we will propose a way to avoid architecture erosion through the idea of custom active conventions. Concretely, custom conventions presented are written in the dedicated language Code Query Language (CQL), supported by the tool NDepend. CQL lets write conventions about a large range of architectural principles such as IoC, layering, cohesion, immutability/purity or encapsulation. Such convention is qualified as active because once integrated into a build process, it automatically warns as soon as it gets violated.

     

     
     

  • The (near) future of Code Correctness

    I just watched this amazing PDC presentation (the most impressing I saw so far): Contract Checking and Automated Test Generation with Pex by Nikolai Tillman and Mike Barnett.

     

     

     

    It is talking about .NET 4 support and tooling around contracts. I strongly encourage you to take an hour to watch it. .NET 4 will come with a contract API, System.Diagnostics.Contracts, much more compelling than the simple (yet very useful) System.Diagnostics.Debug.Assert(…). What is not said (unless I missed it?!) is if there will be some tooling to transform automatically existing Debug.Assert(…) within the new contract API calls. I hope so! The code base of NDepend contains currently 3 470 calls to Debug.Assert(..) for 69 256 lines of code.

     

    I take a chance here to praise the usefulness of contracts and more specifically of Debug.Assert(…). I always felt that there is too much emphasis in the community on automatic testing compare to contracts. For me, these 2 correctness techniques are equally efficient. And actually, what the presentation shows well thanks to the super-promising Pex tool, is that automatic testing and contracts are 2 sides of a same discipline that consists in finding code defects automatically. Pex will promote contract amongst automatic test addict developers and that is a good thing. As a side note, I explained in a previous post why and how Debug.Assert(…) checking must be activated during automatic tests run.Unit Test vs. Debug.Assert() 

     

    My feeling has always been that automatic tests and the code itself are 2 different ways to express the same thing: how a piece of code should behave. Code tells the machine how to do it, step by step (do this to compute result from an input) while automatic tests represent a more declarative way of saying what the code should do (this result must be computed from this input), what can be seen as specification. Generating automatically tests from code, this is the job of Pex, makes sense because as just said, the code already contains enough information needed to test it. There is a problem however: if the code contains some behavior defects, Pex will generate flawed specification. This is why contract are needed, to assert specifications that cannot be inferred from the code and see if the code abides by the contracts rules.


    Bets are open: C#5 will come with some syntax sugar that will generate IL code that will call the .NET 4 contract API under the hood, a bit like the using {...} syntax sugar calls the IDisposable API.

     

     

  • Advices for developers on starting an Independent Software Vendor (ISV) business

     

    As a developer (and if you read my blog there are good chances you are a developer) you have an enormous potential to start a company. I think that most developers don’t realize how big this potential is. I started an ISV business by selling the tool NDepend for .NET developers. I am still in the process of making the business growing and it is a great adventure. I would like to expose in this blog post some advices that could help a developer doing so.

     

     

    The cost of developing and selling software

     

    What do you need to start developing a software? A good PC, some development tools and an internet connection. Maximum 3000US$ investment (that you certainly have already invested). Most other engineering industries needs millions of $ investment. Think of the cost of starting a business of physical assets like electronic devices.

     

    Minimal investment is a good incentive but the biggest advantage of software over others engineering industries comes from the cost of unit. What does it cost to build one license of a software? 0 US$. What does it cost to build 1.000 licenses of a software? 0 US$.

     

    I am caricaturing and the provocation is intentional. Of course the cost of doing software is the investment made in development. Also, if you have 10.000 users, the cost of the support will certainly be 10 times the cost of supporting 1.000 users. Concerning the support it is all about making it scale. The experience we have with our tool, shows that playing the agility rules largely pays. Releasing often to make sure that all bugs reported are fixed, investing in the quality, investing in a neat UI by listening carefully to feedback, making sure to update the documentation each time a user ask a question, yes it pays. The number of questions asked and bug reported stays small and is not increasing linearly with the number of users.

     

     

    Advantages of becoming an ISV over doing another software development job

     

    Before digging further, I would like to underline the advantages of being an ISV over being a consultant, an employee or a trainer? Here are my thoughts:

    ·         You spend most of the time doing what you really like, coding.

    ·         Freedom! There is nobody to tell you what you should work on, when you should work or the amount of quality you want to put in your code. You can also (more than less) choose who you want to work with and the location from where you are working. South of France or a tropical island, nobody cares except you. The availability of a decent and reliable internet connection is the only limit.

    ·         Day after day, you capitalize on a code base and you can begin dream on. Successful small ISVs business sometime end up being bought by bigger companies.

    ·         The amount of incoming money is not proportional to the work put in it. If the software doesn’t sell well this is a very bad point. But in the case of successful sales, it can be much more income than any consultancy position. The reason is clear: as explained below, you can make it so that selling 1 or 1000 licenses cost you the same price. Software is the only industry that makes this magic possible. Is it by chance that the richest man in the world during the last 13 years was primarily a software developer?

    ·         Starting a business tends to be a positive point on a CV. Even if your business fails it doesn’t mean that you lost some time in your career. During a job interview you can ramble on the courage and motivation you had starting such an adventure and what you learnt on the way. Many interviewers you’ll meet are employee forever that might be impressed by what you dared.

     

    Controlling the human factor

    The only human factor you can control is your own motivation. How can you prevent anyone on earth potentially