News, code, articles, rants; a daily dose of programming rigmarole.

Tuesday, October 7, 2008

Typed DataSets versus Domain Model

Recently I worked on a project that used typed DataSets as a domain model. Because of my experiences on the project, I decided to post about the usage of typed DataSets for domain modeling. I personally didn't have much experience with typed DataSets since I always preferred classic domain modeling using good OO principles and user-defined types, so this was a nice new task.


Initial Thoughts
When I started on the project, my first task was to create a WCF persistence service. This persistence service would interact directly with two typed DataSets in the way of classic CRUD.

My initial thoughts were the use of typed DataSets were pretty nice. It was quite easy to populate them from a resultset retrieved from a SQL Server database. The built-in support within the ADO.NET API was nice. Built-in support for simple data validation (not allow DBNull and such) was nice. Overall, my first impression was that typed DataSets are nice to work with, almost fun.


Reality Set In
Those thoughts quickly waned. What I started to find is that typed DataSets allowed inconsistent behavior when using them; for example, you can set a column's AllowDbNull property to false but then set its default value to DBNull.

Continuing on, I discovered other problems that I consider serious. One is a very simple modeling issue: multiplicity. Simply stated, you cannot enforce a 1:1 relationship between two tables in a dataset. More accurately stated, you cannot control the multiplicity of relationships. That is, if you're modeling automobiles, and you have a Car table, you cannot enforce that the Engine table only has one record; by nature in relational models, a foreign key is 1:M, and with typed DataSets, this is no different. With classic domain modeling, this is of course trivial. There are also problems with aggregation, composition and generalization with typed DataSets as well; this will be discussed later.


Good Does Exist
There are some good things about typed DataSets that you get with using them. For one you get free ORM, which in my book is huge (restated, you don't have to do ORM since you can use built-in ADO.NET, and phrased yet another way which is that you only have two choices for ORM: manual property setting and DataAdapter.Fill). Another nicety is Visual Studio will automatically generate strongly-typed query methods on a DataTable for keyed columns when using the DataSet designer, such as GetCarByID().

Another nicety is a DataSet's built-in support for databinding. One concern with using classic domain modeling is the fact that you have to build in data binding support. What I've done to overcome this is create reusable base classes that implement the required interfaces (such as INotifyPropertyChanged), support for undo/redo, and use BindingList for binding entities to a DataGridView, however data binding with domain models is largely a manual (and sometimes complex) effort. However my experience has been that the trade off is worth it from an architectural quality standpoint.



Typed DataSets vs Domain Modeling: Which to Use?
I could probably spend another 10-15 pages about why I always prefer domain modeling over typed DataSets, but it all boils down to what adds real business value.

What I've found is that you will generally spend less time building your data layer when used typed DataSets. This is because of the fact that you can use the ADO.NET API for data access and ORM. That is, you can depend on an IDataAdapter implementation to fill the DataSet. That means in the complete interest of RAD, typed DataSets increase time-to-market.

That being said, for the reasons mentioned above, typed DataSets cannot produce accurate domain models, nor were they ever meant for such a thing. Typed DataSets work like relational databases, which do not support even basic OO principles. Of course typed DataSets are classes, so there is some support for OO (after all, typed DataSets descend from DataSet), but typed DataSets in themselves are not the entity: the tables they contain are, which is why the application of OO fails when modeling a domain.

Simply put: domain modeling with user types are the right approach to domain modeling. With user types, you can support generalization, (accurate) multiplicity, (better) code reuse and other OO principles that improve the overall architectural quality. Development teams can spend lots of time getting products to market, but the simple fact is architectural quality will have a postive or negative impact on long-term and repeatable success, and all too often, not enough time is spent on quality since its ROI is generally not measured. This is often followed by more bad code and bad architectural decisions, but that's a discussion for another day (blog?).

Monday, October 6, 2008

Software Modeling: Why Bother?

If you've written software for any length of time professionally, or even if you're in school, you have likely come across at least one UML diagram. If not, let's have a quick refresher (or introduction for those under the proverbial rock).

UML is a formal language created to model different aspects of software construction, user interactions with software, activities that code and/or users carry out, and more. The standard UML diagrams are Class, Sequence, Activity, and Use Case diagrams. A quick description of each is below:
  • Class diagrams - these diagrams are used to present visual models of software classes, and are generally used for languages with some measure of classic OO like C++, C#, Java and others.
  • Sequence diagrams - a sequence diagram is a high-level diagram used to visually depict the sequence of events for a given software module, IPC/RPC call, a user-initiated event in a software system, and just about anything that contains a sequence of events that can be illustrated (even washing clothes, which I can graph easily but I try to avoid in real life).
  • Activity diagrams - activity diagrams are used to illustrate a specific activity. This is another high-level diagram that depicts sequences of events, except that an activity diagram is more thorough in illustrating how those sequences are carried out.
  • Use Case diagrams - use case diagrams are used to visually illustrate what a given item will be used for. This is usually used to depict use cases for a screen in an application, a software module, or anything that has uses and is useful (like a washing machine).
In case you haven't noticed, UML diagrams are used to visually convey information. That means the consumer of a UML diagram should be able to visually grasp the information being conveyed. If this is not possible, the diagram isn't useful, and in practice, this happens all too often (like when a class diagram has a few hundred classes). What this means is there are some good guidelines on what should go into a UML diagram (Martin Fowler has a great book on the subject). For now I hope I've conveyed lexically the visual nature of UML diagrams.

Class Diagram Example
Sometimes the best way to illustrate stuff is visually (are we getting somewhere yet?), so let's look (visually) at an example of how class diagrams provide the ability to convey information about an object model.

Below are 3 classes and an enumeration in C#. Firstly I show the code files for each class, then below that is a UML Class diagram. Which would you say helps you get the full picture more rapidly and definitively?

public abstract class PersonBase{
public string Name{
}
public string Address{
}
}

public class Citizen : PersonBase{
public CarTypes CarType{
}
}

public class Employee : PersonBase{
public double PayRate{
}
public string Department{
}
}

public enum CarTypes{
Car,
Truck,
Van
}




You can see how the object model is depicted much more succinctly and rapidly with the UML class diagram. You can view all classes at the same time, and how they relate to each other. With code files, you only have the language syntax to help you understand the interactions between classes. While developers can be expected to know the language well enough to understand the object model, when discussing architecture and design, thoughts are much more succinct using modelling then they are using code files.

Sequence Diagram Example
Below is a sequence of events that will occur when washing clothes. This sequence of events are listed out using bulleted points.

  • Wear clothes; dirty 'em up
  • Load washer with clothes
  • pour soap in
  • Set washer
  • Start washer
  • Wait for washer to finish
  • Move to dryer
  • Set dryer
  • Start dryer
  • Wait for dryer to finish
  • Fold clothes
  • Put clothes away
Let's take a look at the sequence diagram for the above steps:

Now lets take a look at the above steps in a sequence diagram:



Sequence diagrams are much better at conveying the actual order of steps and the interaction between systems. Another thing sequence diagrams do much better than a bulleted list is looping, which I won't demonstrate here.

UML as Blueprints
When an architect is designing a building, the architect will use blueprints, which is a visual drawing created using complex CAD software. It visually depicts the size of rooms in height/width, the location of pillars in a building, the size and angle (property lines) of the lot the building will reside on, the location of walls, doors, and windows, and many more things that I won't mention. When conveying the design of the building to anyone, the architect will rely on these blueprints to convey every last detail of their building with complete accuracy.

When a construction team is hired to create the building, the construction team will use these blueprints to ensure the architects design is implemented correctly. Along the way, the construction team may point out flaws in the architecture that have to be corrected. In this case, the use of blueprints makes perfect sense. In fact, it could be stated that the use of blueprints is required to construct the building with any level of accuracy in regards to the architect's design.

In the book Code Complete Vol. 2, building software is likened to formal construction of buildings; in fact, the terminology "writing" software is eschewed for "building" software and UML diagrams are likened to software "blueprints". In other words, it is stated that software construction parallels building construction: architects design the software, developers construct the architecture using the architect's output, QA staff perform "building" (quality) inspections, and UAT processes provide the "walk-through".

When thinking about software as construction, it leads you to see the benefit of visual documentation. UML serves this purpose very well, as it allows you to blueprint every aspect of your system, from business requirements down to implementation details such as abstract base classes and descendents, interfaces and realizations (implementors), and so on. It helps you to see how applying development process to building software aids in creating repeatable success; that is, building and shipping software is one thing, being able to do it over and over in a systematic fashion is quite another, and the latter yields more long term success.

Closing
I hope that I've demonstrated how things can be conveyed using visual diagrams much more succinctly and rapidly. I hope that you can see the benefit of class diagrams and sequence diagrams, and I would encourage you to look into activity and use case diagrams. In other words, I hope that you can see the real benefit of UML documentation. It not only provides visual aids in understanding a software system, it can act as a true software blueprint when formal architecture preceeds writing code (and even when it doesn't).

UML can be used to get new team members up to speed on the architecture. UML can give help the business players on your team to really understand what developers are doing in that dark room, and can help everyone speak the same language. Finally, UML introduces formal architecture, which gets developers/architects thinking about the solution to the problem (the implementation) prior to writing any code, which lends itself to accurate and quality implementation.