Data Mining
The cluster profile chart has to handle a few different kinds of information:
- multinomial attributes, presented as histograms
- Missing state of continuous/discrete attributes (also, as histogram)
- Existing state of continuous attributes (presented as mean+StdDev)
A generic query that returns all the information grouped by the cluster is:
select
NODE_CAPTION AS [Cluster],
(SELECT
ATTRIBUTE_NAME,
ATTRIBUTE_VALUE,
VALUETYPE,
[SUPPORT],
[PROBABILITY],
VBA!Sqr([VARIANCE]) AS [StdDev],
[VARIANCE]
FROM NODE_DISTRIBUTION)
FROM <ModelName>.CONTENT
In particular, if you only care about mean and stddev of (or your model only contains) continuous attributes, here is a simplified version of the above query:
select /*FLATTENED*/
NODE_CAPTION AS [Cluster], // Optional -- only if you need the info for all clusters at once
(SELECT
ATTRIBUTE_NAME,
ATTRIBUTE_VALUE AS [Mean],
VBA!Sqr([VARIANCE]) AS [StdDev]
FROM NODE_DISTRIBUTION WHERE VALUETYPE=3) /*Only for rows representing existing state of continuous variables*/
FROM <ModelName>.CONTENT /*WHERE NODE_CAPTION='Cluster 1'*/
The plus indicates the nested table (NODE_DISTRIBUTION subquery)
To see the flattened results, just uncomment the /*FLATTENED*/ part of the query:
SELECT FLATTENED
NODE_CAPTION AS [Cluster], // Optional -- only if you need the info for all clusters at once
(SELECT
ATTRIBUTE_NAME,
ATTRIBUTE_VALUE AS [Mean],
VBA!Sqr([VARIANCE]) AS [StdDev]
FROM NODE_DISTRIBUTION WHERE VALUETYPE=3)
FROM <ModelName>.CONTENT /*WHERE NODE_CAPTION='Cluster 1'*/