jk_uk2


Hi,

I'm new to data mining, and have created an MS decision trees model. The model has the columns age, call outcome, call reason, country name, employee name and gender - all as inputs.

In the mining model viewer, I only get nodes for the age, despite having data for all the other columns.

Can anyone help

Thanks

Jeremy




Re: MS Decision Trees Question

ggciubuc


You don't mention the column key and the predict column. What is your objective

the column key is some that unique indentify the case

and the predict column (that can be as input too) is some that can be influenced by others attributes. See other post







Re: MS Decision Trees Question

jk_AnalysisServices

Thanks for the reply....

My objective is to predict whether customers will close their account, based on data such as calls made to a call centre, customer age, gender, location etc. My key column is CallID, and my predict column is a boolean Account Closed True/False.

It seems to be working better now that I have more data. My problem that I was not seeing all my inputs on the decision tree. Therefore my question is - if I don't see inputs on the tree, as expected, is there any way to force them to appear E.g. algorithm parameters

Also - I've currently not touched the 'Mining Model Prediction' tab. Is this something that I should be looking at Is it the case that the standard 'mining model' that gets created is used to analyse actuals, where as you have to use the 'Mining Model Prediction' tab for predicitions

If anyone can shed some light on the above it would be appreciated

Many thanks

Jeremy






Re: MS Decision Trees Question

ggciubuc

First, make AccountClosed input and predictable

Second , how many states are your attributes : call outcome, call reason, country name, employee name , because too many , at thousands level, let algorithm to ignore them. You have to have around 100 states of an inpute attribute.

In this case you have to group this states: let's say the attribute "city" with many values; you can groupe and make the states: north_city, west_city, est_city and south_city and replace .






Re: MS Decision Trees Question

jk_AnalysisServices

I've got about 3 - 10 inputs for each attribute.

Age is continuous;

Call outcome has 3;

Call reason has 4;

Country has 5.

Employee has 2.

Then I have about 15,000 records with combinations of these values. Is that ok





Re: MS Decision Trees Question

ggciubuc

I should discretize Age and then i'll eliminate all the duplicate records in this training data sets.

Then reprocess model ...






Re: MS Decision Trees Question

jk_AnalysisServices

Ok.....thanks

Is there any need for me to use the 'Mining Model Prediction' tab What does it get me





Re: MS Decision Trees Question

ggciubuc

Let's say you train your model with this 15,000 or less records, so some behaviors are discovered in your data; if one or more new customers come you can predict/preview their behavior with a probability number.

In other words you can say with a probability if their account can be or not closed.






Re: MS Decision Trees Question

jk_AnalysisServices

That really helps - thanks.

I get it now - its just a query tool basically.

Its a stange place to put a query tool, unless I've misunderstood its purpose.





Re: MS Decision Trees Question

ggciubuc

I hope this post can be helpful for you too.





Re: MS Decision Trees Question

Jamie MacLennan

The reason the other atributes don't show in the tree is that their presence isn't supported by the data. Decision Trees greedily split on the attribute that provides the most information and then iterate on subsets of the data determined by the split. They continue to do this until there's nothing worth splitting on. You can control this decision by changing the COMPLEXITY_PENALTY parameter, although you run the risk of creating an overfit tree if you do so (that's a tree that matches the input data very well, but performs poorly on predicting data that it hasn't seen).

If you want to see how each input influences the output, you would be better off using an algorithm such as naive bayes, logistic regression, or neural networks, that consider all attributes independently.

As answeredin another part of the thread, the Mining Model Prediction tab is there to run prediction against data. You can copy these queries into applications/reports/etc, or save the results to a database. If you are going to score a large amount of data to be put into a database, you should use Integration Services. In the MM prediction tab you can also switch to "singleton" mode where you can manually input value for the prediction query. These values can be replaced with parameters in an application scenario for real-time prediction.






Re: MS Decision Trees Question

jk_AnalysisServices

Thanks for the comments,

I'll have a play around with the COMPLEXITY_PENALITY parameter, and see what happens. I was interested in understanding how the Decision Trees decide to make the split, so I'll definitely take the comments on board and perhaps take a look at one of the other models.

Many thanks

Jeremy