If you have any doubts in the below, contact us by dropping a mail to the Kung Fu Panda. We will get back to you very soon.

- predicting a dependent numeric variable based on other numeric variables.
- sometimes it is assumed that the relationship between dependent and independent variables is a straight line in a form of Y = aX + b.
- our job is to find out the best values of a and b which lead to the best prediction.
- normally uses ordinary least squares distance for estimation.
- Regression can also be used for some classification tasks.
- Logistic Regression => classify an outcome as true or false.
- Poisson's Regression

- Linear Regression
- involves fitting straight line to data.

- Decision Trees
- Regression Trees
- average value of examples at leaf nodes to make numeric predictions.

- Model Trees
- build a regression model at each leaf node in a hybrid approach.

- Regression Trees

- Simple Linear Regression
- where there is only one independent variable.

- Multiple linear regression
- when there are multiple independent variables.

- most common approach for modelling numeric data.

- does not handle missing data.
- only works with numeric features, not with categorical features.

- denotes the relationship between two variables.
- pearson's correlation
- pearson's correlation = Cov(x,y)/SD(x)SD(y)
- ranges from -1 to 1.

- data = read.csv("insurance.csv", stringsAsFactors=TRUE)
- str(data)
- summary(data$field)
- hist(data$field)
- cor(data[c("age", "bmi", "children", "expenses")])
- pairs(data[c("age", "bmi", "children", "expenses")]) => produces graphs of each pair of attributes.
- pairs.panels(data[c("age", "bmi", "children", "expenses")]) => produces graphs and correlation of each pair of attributes.
- mymodel=lm(dependentVar ~ field1 + field2 + field3 + field4, data=data)
- mymodel=lm(dependentVar ~ ., data=data)
- summary(mymodel)

- Lets say we need to convert a non linear model to a linear model.
- To convert a non linear model, y = a + bx^2 to a linear model,
- we will create a variable z = x^2, so that the model will be y = a + bz.
- We can think of a non linear model depending on the use case, like the salary of a person could be a non linear function of the age.

- Regression Trees
- part of Classification and Regression Trees (CART) algo.

- Model Trees
- same as regression trees but at each leaf node, multiple regression trees are created based on examples reaching that node.
- slower than normal regression trees
- better/more accurate than normal regression trees.

- decision trees can be used to model numeric data.
- used when there are many features, with complex relationships between them.
- can be used on a large number of features, as it removes less important features by itself.
- considered better than linear regression.
- model is easily understandable, just like any tree based model.

- Not as used as often as linear regression.
- require large amount of training data.
- slower than a normal linear regression model.
- sometimes the interpretation of the model can be difficult, particularly in case of large trees.

- wines_train=read.csv("wines.csv")
- model=rpart(quality ~ ., data=wines_train)
- rpart.plot(model, digits=3)
- rpart.plot(model, digits=4, fallen.leaves=TRUE, type=3, extra=101)
- pred=predict(model, wine_test)
- summary(pred)
- //building the model tree
- modeltree=M5P(quality ~ . , data=wines_train)