Eric Harmon

Hi, I'm developing a new application, and a new database to go along with it. I'm trying to develop version 1 with an eye towards LINQ To Entities so I'll be positioned to take advanteg of it when it ships. I don't have the luxury of having a DBA, so I have to create the database structure myself, and I'm not a database expert :-(

In my database I have a Contacts table (which is abstract), a Persons table, and a Companies table. Basically, a contact can be a person or a company, and I'll never create a contact itself - it's just a base "class". So my contact table looks like this:

ContactID uniqueidentifier <- primary key
WebSite varchar(100)
...
TS timestamp <- for concurrency

The company table looks like this:

ContactID uniqueidentifier -< primary key, and matches ContactID in the Contacts table
CompanyName varchar(50)
...
TS timestamp <- for concurrency

Now, here are my questions:

1) This first question is most likely unrelated to LINQ To Entities, but should I use the same primary key for both the Contacts table and the Companies table (i.e., ContactID) or should the Companies table use a CompanyID as the primary key and then have another column named ContactID that links back to the ContactID in the Contacts table Because this is a 1:1 join (well, sorta - there will never be more than one company with the same ContactID but there will certainly be records in the Contacts table without an associated company record) I'm thinking I should use ContactID as the primary key for both tables.

2) The samples I've seen would suggest having a discriminator column in the Contacts table. However, pre-LINQ I would just do something like SELECT ... FROM Companies INNER JOIN Contacts ON Companies.ContactID = Contacts.ContactID. No need for a discriminator in order to retrieve all companies. Is the discriminator mandatory when I go to LINQ (in which case I'll put it in right now even though I won't really use it yet) I realize it's used if I want to just select all contacts and it goes about creating a polymorphic list with some Person objects in it and some Company objects in it. I'll never be doing that - I'll either be retrieving a list of Companies or a list of Persons.

3) In terms of RI, I want to set this up with cascaded deletes so both tables get deleted (from) at the same time. Should I cascade from Contacts to Companies so when I delete the contact record the associated company record gets deleted Or should I cascade from Companies to Contacts so when I delete the company record the associated contact record gets deleted Or does LINQ just figure it all out and I don't need to cascade deletes at all

Thanks in advance for any best practices and ideas you can offer.

-Eric Harmon



Re: ADO.NET (Pre-release) A couple LINQ To Entities questions

Eric Harmon

I forgot one additional question:

4) Am I going to get into trouble by having a timestamp column named TS in both the Contacts table and the Companies table I'm just wondering because the class hierarchy is going to try to create a base class (Contact) with a TS property in it and then a derived class (Company) with a property named TS, right Should I rename these columns to ContactTS and CompanyTS in order to avoid a naming collision, or isn't that going to be an issue with LINQ to Entities

(Actually, every single table in my database has a TS column so I wonder if I should rename them all...)

-Eric





Re: ADO.NET (Pre-release) A couple LINQ To Entities questions

Zlatko Michailov - MSFT

Eric,

As you have sensed already, your questions are not LINQ-related. They touch on a fundamental design decision you need to make ¨C how to model your class hierarchy in the database The options are (I¡¯m not suggesting any preference):

¡¤ Table per Hierarchy (TPH): There is one table whose schema is the union of all leaf classes, i.e. it contains all the possible columns that a class may have. Than an instance of a class populates only the relevant columns. In this case, you need an additional discriminator column.

o Advantage: this is the simplest modeling scenario; no redundant definitions; efficient access.

o Disadvantage: many unused cells in the table; model is a not obvious.

¡¤ Table per Type (TPT): In this case types from the hierarchy are mapped 1:1 to tables in the database. All the base portions of all derived types reside in the same table. Tables that represent derived types must duplicate the key columns and must contain only the new properties that the derived type introduces. In this case you don¡¯t need a discriminator column.

o Advantage: no wasted space; no redundant definitions; clean model.

o Disadvantage: JOINs are required to construct derived classes.

¡¤ Table per Concrete type (TPC): In this case there must be a table for each non-abstract class that has all the columns that class may have, including all columns from all ancestor classes. You don¡¯t need a discriminator.

o Advantage: efficient access; somehow cleaner model than TPH; no wasted space.

o Disadvantage: redundant definitions.

With all that information you should chose what mapping scenario is best for your application, and you should answer questions 1) and 2) regardless of LINQ.

On question 3). In your specific case, I assume you¡¯ll never deal with Contacts directly. Instead, you¡¯ll manipulate Companies and Persons. That means you don¡¯t need to worry about RI because the base class is not independent of the derived class. Entity Framework will take care of deleting the Contact part once you delete the Company or the Person.

I¡¯ll refer to the mapping scenario again ¨C in TPH you have a single table anyway; in TPT you only need a timestamp in the base class; and in TPC concrete types don¡¯t collide.

Good luck!

Zlatko Michailov

Program Manager, ADO.NET

Microsoft Corp.






Re: ADO.NET (Pre-release) A couple LINQ To Entities questions

Eric Harmon

Zlatko,

Thanks for the detailed reply. I'm thinking I like the TPT method, but I'd like to read more on this. Can you suggest a book or article or something to read that will dig into this more deeply

I'm not sure I completely understand why I only need a timestamp in the base class in TPT. If all I'm doing is updating a derived class (say, company), I would write something like this:

UPDATE Company SET Name = @Name WHERE ContactID = @ContactID

But I don't have a timestamp to compare against. I realize there's a timestamp in the base (Contact) table, but I'm not updating that one. How can I write

UPDATE Company SET Name = @Name WHERE ContactID = @ContactID AND <Contact's TS> = @TS

-Eric





Re: ADO.NET (Pre-release) A couple LINQ To Entities questions

Zlatko Michailov - MSFT

Eric,

The idea behind TPT is no rocket science. You can search the web for ¡°object-relational mapping¡± (ORM) and you¡¯ll find enough information. What¡¯s more important is to understand the chosen platform¡¯s implementation. In this case - ADO.NET. We have a post on our team blog dedicated to inheritance: http://blogs.msdn.com/adonet/archive/2007/03/15/inheritance-in-the-entity-framework.aspx

Speaking of the platform¡¯s implementation, I¡¯ll explain why a single timestamp in the base type is sufficient. When you program against an entity model, you have to think at entity-level. That means you shouldn¡¯t worry about the actual mapping of the inheritance. You¡¯ll be updating instances of entity types, not rows in database tables. Assuming you¡¯ve properly marked the TS property for concurrency control, whenever you update any properties of Company, beit solely in the derived portion, our update pipeline will simulate an update on the corresponding row of the base table to have the TS recomputed. That is because we have sufficient metadata, and we ¡°know¡± we have to do that.

Zlatko Michailov

Program Manager, ADO.NET

Microsoft Corp.