2024 – Page 81

Survival analysis deals with predicting the time when a specific event is going to occur. It is also known as failure time analysis or analysis of time to death. For example predicting the number of days a person with cancer will survive or predicting the time when a mechanical system is going to fail. The R package named survival is used to carry out survival analysis. This package contains the function Surv() which takes the input data as a R formula and creates a survival object among the chosen variables for analysis. Then we use the function survfit() to create a plot for the analysis. Install Package Syntax The basic syntax for creating survival analysis in R is − Following is the description of the parameters used − Example We will consider the data set named “pbc” present in the survival packages installed above. It describes the survival data points about people affected with primary biliary cirrhosis (PBC) of the liver. Among the many columns present in the data set we are primarily concerned with the fields “time” and “status”. Time represents the number of days between registration of the patient and earlier of the event between the patient receiving a liver transplant or death of the patient. When we execute the above code, it produces the following result and chart − From the above data we are considering time and status for our analysis. Applying Surv() and survfit() Function Now we proceed to apply the Surv() function to the above data set and create a plot that will show the trend. When we execute the above code, it produces the following result and chart −

Random Forest

4. Statistics Examples

In the random forest approach, a large number of decision trees are created. Every observation is fed into every decision tree. The most common outcome for each observation is used as the final output. A new observation is fed into all the trees and taking a majority vote for each classification model. An error estimate is made for the cases which were not used while building the tree. That is called an OOB (Out-of-bag) error estimate which is mentioned as a percentage. The R package “randomForest” is used to create random forests. Install R Package Use the below command in R console to install the package. You also have to install the dependent packages if any. The package “randomForest” has the function randomForest() which is used to create and analyze random forests. Syntax The basic syntax for creating a random forest in R is − Following is the description of the parameters used − Input Data We will use the R in-built data set named readingSkills to create a decision tree. It describes the score of someone’s readingSkills if we know the variables “age”,”shoesize”,”score” and whether the person is a native speaker. Here is the sample data. When we execute the above code, it produces the following result and chart − Example We will use the randomForest() function to create the decision tree and see it’s graph. When we execute the above code, it produces the following result − Conclusion From the random forest shown above we can conclude that the shoesize and score are the important factors deciding if someone is a native speaker or not. Also the model has only 1% error which means we can predict with 99% accuracy.

Community and Ecosystem

3. History

Recent Developments

3. History

Recent Developments

3. History

Integration and Interoperability

3. History

Decision Tree

4. Statistics Examples

Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions. It is mostly used in Machine Learning and Data Mining applications using R. Examples of use of decision tress is − predicting an email as spam or not spam, predicting of a tumor is cancerous or predicting a loan as a good or bad credit risk based on the factors in each of these. Generally, a model is created with observed data also called training data. Then a set of validation data is used to verify and improve the model. R has packages which are used to create and visualize decision trees. For new set of predictor variable, we use this model to arrive at a decision on the category (yes/No, spam/not spam) of the data. The R package “party” is used to create decision trees. Install R Package Use the below command in R console to install the package. You also have to install the dependent packages if any. The package “party” has the function ctree() which is used to create and analyze decison tree. Syntax The basic syntax for creating a decision tree in R is − Following is the description of the parameters used − Input Data We will use the R in-built data set named readingSkills to create a decision tree. It describes the score of someone’s readingSkills if we know the variables “age”,”shoesize”,”score” and whether the person is a native speaker or not. Here is the sample data. When we execute the above code, it produces the following result and chart − Example We will use the ctree() function to create the decision tree and see its graph. When we execute the above code, it produces the following result − Conclusion From the decision tree shown above we can conclude that anyone whose readingSkills score is less than 38.3 and age is more than 6 is not a native Speaker.

Integration and Interoperability

3. History