Vivek C.


Hi,

I have a many-to-many relationship scenario and I had used a group dimension instead of the bridge table concept.

I had referred "Data Warehouse Lifecycle Toolkit" by Kimball and others.

For ex:
An Athlete belonging to multiple athletic teams and with the possiblity of changing every year.

So I had created an Athlete Team dimension, with a group key and the team code.

Athelete Group Key,
Athelete Team Code,
Athelete Team Name

with Athlete Group Key and Athlete Team Code as the key for the table.

And in the fact/measure group, referring to the Group Key.

Now, in SSAS, I had created the Athlete Team dimension with Group Key as the key atrribute and others as dimension attributeS.


***********************
Athlete dimension table
-----------------------

Group Key, Athlete Code, Athlete Name,
1 , FTB , Football
1 , TEN , Tennis
2 , TEN , Tennis
2 , VOL , Volleyball

Fact/Measure group
------------------

Athlete ID, Year, Group Key
123 , 2001, 1
234 , 2001, 2

************************

When processed and browsed the cube, for the number of students for each type of team, the result is considering only the first team for the count (it is ordered alphabetically).

In the sample data example above, the count of athletes for team Football, Tennis and Volleyball are 1, 2 and 1 respectively.

But in SSAS, the count of athletes for the team Football, Tennis and Volleyball are 1, 1 and 0 respectively.

For the same requirement, the SQL query on the SQLDB, is fetching me the correct result, but this is required for adhoc reporting as well and hence have to be done using the cube itself.

Comments on this would be really helpful.

Thanks,
Vivek C.




Re: Many to many relationships

Bryan C. Smith


Vivek,

I might just be getting stuck on the example, but this sounds like a Type 2 changing dimension and not a many-to-many scenario.

So, you have this Athlete dimension. It's primary key is AthleteID and contains a reference to the TeamID. Here is the structure of the table:

create table Athlete (

AthleteID int not null identity(1,1),

Name varchar(200) not null,

TeamID int not null,

RowStartDate datetime not null default (getdate()),

RowEndDate datetime null,

IsRowCurrent bit not null default (1)

)

alter table Athlete add

constraint PK_Athlete primary key (athleteid),

constraint AK_Athlete unique (name, rowstartdate),

constraint FK_Athlete_TeamID foreign key (teamid) references Team (TeamID)

create table Team (

TeamID int not null identity(1,1),

Name varchar(200) not null

)

alter table Team add

constraint PK_Team primary key (teamid),

constraint AK_Team unique (teamid)

Every time an athlete changes a team, a new record is created for the record. The previous record gets end dated and the IsRowCurrent flag gets properly set. In your cube, you can use these two tables in a single dimension or do two dimensions and create a referenced relationship.

If an athlete could belong to multiple teams simultaneously, you would then have a many-to-many relationship. In this scenario, you would have Athlete and Team dimension tables. The Athlete table would not have a reference to the Team dimension table. Instead, you would have a AthleteTeam table which contains a reference to the Athlete dimension table and another to the Team dimension table. This is your bridge table concept.

Let's say a single record in the fact table pointed to multiple athletes, such as athletes involved in a play (think baseball). This would be the group table you mentioned. You would have an Athlete table like described above that points to Team. You would build a table that assigns a single group ID to the list of players involved, and a Group table that represents that list. The fact table would point to the Group table.

create table AthleteGroup (

AthleteGroupID int not null identity(1,1),

NumberOfAthletes int not null

)

alter table AthleteGroup add constraint PK_AthleteGroup primary key (athletegroupid)

create table AthleteGroupAthlete (

AthleteGroupID int not null,

AthleteID int not null

)

alter table AthleteGroupAthlete add

constraint PK_AthleteGroupAthlete primary key (AthleteGroupID, AthleteID),

constraint FK_AthleteGroupAthlete_AthleteGroupID foreign key (athletegroupid) references AthleteGroup (AthleteGroupID),

constraint FK_AthleteGroupAthlete_AthleteID foreign key (athleteid) references Athlete (AthleteID)

The AthleteGroupAthlete table would then be used to create a measure group with a single measure (using a COUNT aggregation). This becomes the intermediate measure group in the many-to-many relationship.

Good luck,
Bryan







Re: Many to many relationships

Vivek C.

Hi Bryan,

Thanks for the reply.

My requirement is for an athlete belonging to multiple teams simultaneously. And the way you have explained in the second part, would have been the right way to move ahead.

But currently, the way it is implemented is as explained my first post. Is there a way to make it work with some settings in Analysis Services Solution As said earlier the concept of the grouping works, when we work on the relational database based data warehouse.

Thanks,

Vivek C.






Re: Many to many relationships

Bryan C. Smith

Sorry for not replying sooner. For some reason I'm not getting alerts regularly.

Regarding the Athelete dimension, I'm afraid I don't see a way to implement it the way you describe. Is changing the model an option for you

Thanks,
Bryan






Re: Many to many relationships

Vivek C.

Changing the data model would be difficult, as we have already done with the ETL work and certain set of reports.

Anyway thanks for the inputs, let me see whether I can talk others into changing the model.

Thanks,

Vivek





Re: Many to many relationships

Matt Carroll

The simplest representation seems to be a fact table reperesenting membership in a team and then a Team dimension, an Athlete dimension, and a Time dimension (with a granularity of year). Then the fact table simply contains a row for each team an athlete is on for each year the athelete is on that team. Then your measure group can contain a single measure that is a count of rows which you can filter based on any combination of team, athlete, or year. If I recall correctly, Kimball calls this type of design a "factless fact table".




Re: Many to many relationships

Vivek C.

Hi Matt,

Thanks for the inputs. The reason for not implementing using the factless fact (this was our first option), was data explosion.

Consider the case where the Athlete can belong to different team simultaneously and this changing every quarter. In this case the number of records would keep increasing periodically (quarterly). So we thought we would go the grouping of athlete teams.

Though the grouping option works with the data warehouse, we are not able to get it work properly using SSAS.

My guess is that SSAS handles many to many relationships using the factless fact table.

Thanks,

Vivek





Re: Many to many relationships

Matt Carroll

You'd be surprised at how many records you can handle if you keep them smalll by using int or smallint for your foreign keys. Assuming 3 int foreign keys, each row would cost you 3*4=12 Bytes without any indexes and SSAS doesn't need indexes. 1000 athlethes on an average of 4 teams over 100 years at the quarter level granularity would need 1000*4*100*4=19,2000,000 rows. This sounds like a lot, but at 12 bytes per row this is only 18.3MB which is tiny.

I must admint, I don't fully understand the design you originally described. It sounds like two dimensions and one fact table with no many-to-many relationships in AS terms (that is, both dimensions are regular dimensions for the measure group.) Based on this, I suspect the problem you are currently seeing is caused by dimension keys not being unique for the Athlete Team table since Team Code will occur multiple times in the Ahtlete Team table. I think what you may want to do in this design is to break out Group Key from the Ahtlete Team table and create a GroupToTeam table contain Group Key and Team code. This table would be a fact table in AS which would then be used to create a many-to-many relationship from your other Fact table to the Athlete Team table.






Re: Many to many relationships

Vivek C.

Matt,

The way you have explanied of breaking the Group Key from the Athelete Team table/dimension is the best thing to do.

Anyway, we have gone ahead and changed our data model to the way exactly SSAS handles, viz. by factless fact tables.

Thanks for the inputs.

Vivek