Author: saqibkhan

  • Lesser Speed

    R programming language is much slower than other programming languages such as MATLAB and Python. In comparison to other programming language, R packages are much slower.

    In R, algorithms are spread across different packages. The programmers who have no prior knowledge of packages may find it difficult to implement algorithms.

  • Weak Origin

    The main disadvantage of R is, it does not have support for dynamic or 3D graphics. The reason behind this is its origin. It shares its origin with a much older programming language “S.”

  • Complicated Language

    R is a very complicated language, and it has a steep learning curve. The people who don’t have prior knowledge or programming experience may find it difficult to learn R.

  • Analysis of Covariance

    We use Regression analysis to create models which describe the effect of variation in predictor variables on the response variable. Sometimes, if we have a categorical variable with values like Yes/No or Male/Female etc. The simple regression analysis gives multiple results for each value of the categorical variable. In such scenario, we can study the effect of the categorical variable by using it along with the predictor variable and comparing the regression lines for each level of the categorical variable. Such an analysis is termed as Analysis of Covariance also called as ANCOVA.

    Example

    Consider the R built in data set mtcars. In it we observer that the field “am” represents the type of transmission (auto or manual). It is a categorical variable with values 0 and 1. The miles per gallon value(mpg) of a car can also depend on it besides the value of horse power(“hp”).

    We study the effect of the value of “am” on the regression between “mpg” and “hp”. It is done by using the aov() function followed by the anova() function to compare the multiple regressions.

    Input Data

    Create a data frame containing the fields “mpg”, “hp” and “am” from the data set mtcars. Here we take “mpg” as the response variable, “hp” as the predictor variable and “am” as the categorical variable.

    input <- mtcars[,c("am","mpg","hp")]
    print(head(input))

    When we execute the above code, it produces the following result −

                       am   mpg   hp
    Mazda RX4          1    21.0  110
    Mazda RX4 Wag      1    21.0  110
    Datsun 710         1    22.8   93
    Hornet 4 Drive     0    21.4  110
    Hornet Sportabout  0    18.7  175
    Valiant            0    18.1  105
    

    Explore our latest online courses and learn new skills at your own pace. Enroll and become a certified expert to boost your career.

    ANCOVA Analysis

    We create a regression model taking “hp” as the predictor variable and “mpg” as the response variable taking into account the interaction between “am” and “hp”.

    Model with interaction between categorical variable and predictor variable

    # Get the dataset.
    input <- mtcars
    
    # Create the regression model.
    result <- aov(mpg~hp*am,data = input)
    print(summary(result))

    When we execute the above code, it produces the following result −

                Df Sum Sq Mean Sq F value   Pr(>F)    
    hp           1  678.4   678.4  77.391 1.50e-09 ***
    am           1  202.2   202.2  23.072 4.75e-05 ***
    hp:am        1    0.0     0.0   0.001    0.981    
    Residuals   28  245.4     8.8                     
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    This result shows that both horse power and transmission type has significant effect on miles per gallon as the p value in both cases is less than 0.05. But the interaction between these two variables is not significant as the p-value is more than 0.05.

    Model without interaction between categorical variable and predictor variable

    # Get the dataset.
    input <- mtcars
    
    # Create the regression model.
    result <- aov(mpg~hp+am,data = input)
    print(summary(result))

    When we execute the above code, it produces the following result −

                Df  Sum Sq  Mean Sq   F value   Pr(>F)    
    hp           1  678.4   678.4   80.15 7.63e-10 ***
    am           1  202.2   202.2   23.89 3.46e-05 ***
    Residuals   29  245.4     8.5                     
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    This result shows that both horse power and transmission type has significant effect on miles per gallon as the p value in both cases is less than 0.05.

    Comparing Two Models

    Now we can compare the two models to conclude if the interaction of the variables is truly in-significant. For this we use the anova() function.

    # Get the dataset.
    input <- mtcars
    
    # Create the regression models.
    result1 <- aov(mpg~hp*am,data = input)
    result2 <- aov(mpg~hp+am,data = input)
    
    # Compare the two models.
    print(anova(result1,result2))

    When we execute the above code, it produces the following result −

    Model 1: mpg ~ hp * am
    Model 2: mpg ~ hp + am
      Res.Df    RSS Df  Sum of Sq     F Pr(>F)
    1     28 245.43                           
    2     29 245.44 -1 -0.0052515 6e-04 0.9806
    

    As the p-value is greater than 0.05 we conclude that the interaction between horse power and transmission type is not significant. So the mileage per gallon will depend in a similar manner on the horse power of the car in both auto and manual transmission mode.

  •  Basic Security

    R lacks basic security. It is an essential part of most programming languages such as Python. Because of this, there are many restrictions with R as it cannot be embedded in a web-application.

  • Data Handling

    In R, objects are stored in physical memory. It is in contrast with other programming languages like Python. R utilizes more memory as compared to Python. It requires the entire data in one single place which is in the memory. It is not an ideal option when we deal with Big Data.

  • Continuously Growing

    R is a constantly evolving programming language. Constantly evolving means when something evolves, it changes or develops over time, like our taste in music and clothes, which evolve as we get older. R is a state of the art which provides updates whenever any new feature is added.

  • Poisson Regression

    Poisson Regression involves regression models in which the response variable is in the form of counts and not fractional numbers. For example, the count of number of births or number of wins in a football match series. Also the values of the response variables follow a Poisson distribution.

    The general mathematical equation for Poisson regression is −

    log(y) = a + b1x1 + b2x2 + bnxn.....
    

    Following is the description of the parameters used −

    • y is the response variable.
    • a and b are the numeric coefficients.
    • x is the predictor variable.

    The function used to create the Poisson regression model is the glm() function.

    Syntax

    The basic syntax for glm() function in Poisson regression is −

    glm(formula,data,family)
    

    Following is the description of the parameters used in above functions −

    • formula is the symbol presenting the relationship between the variables.
    • data is the data set giving the values of these variables.
    • family is R object to specify the details of the model. It’s value is ‘Poisson’ for Logistic Regression.

    Example

    We have the in-built data set “warpbreaks” which describes the effect of wool type (A or B) and tension (low, medium or high) on the number of warp breaks per loom. Let’s consider “breaks” as the response variable which is a count of number of breaks. The wool “type” and “tension” are taken as predictor variables.

    Input Data

    input <- warpbreaks
    print(head(input))

    When we execute the above code, it produces the following result −

          breaks   wool  tension
    1     26       A     L
    2     30       A     L
    3     54       A     L
    4     25       A     L
    5     70       A     L
    6     52       A     L
    

    Create Regression Model

    output <-glm(formula = breaks ~ wool+tension, data = warpbreaks,
       family = poisson)
    print(summary(output))

    When we execute the above code, it produces the following result −

    Call:
    glm(formula = breaks ~ wool + tension, family = poisson, data = warpbreaks)
    
    Deviance Residuals: 
    
    Min       1Q     Median       3Q      Max  
    -3.6871 -1.6503 -0.4269 1.1902 4.2616 Coefficients:
            Estimate Std. Error z value Pr(&gt;|z|)    
    (Intercept) 3.69196 0.04541 81.302 < 2e-16 *** woolB -0.20599 0.05157 -3.994 6.49e-05 *** tensionM -0.32132 0.06027 -5.332 9.73e-08 *** tensionH -0.51849 0.06396 -8.107 5.21e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1)
    Null deviance: 297.37  on 53  degrees of freedom
    Residual deviance: 210.39 on 50 degrees of freedom AIC: 493.06 Number of Fisher Scoring iterations: 4

    In the summary we look for the p-value in the last column to be less than 0.05 to consider an impact of the predictor variable on the response variable. As seen the wooltype B having tension type M and H have impact on the count of breaks.

  • Statistics

    R is mainly known as the language of statistics. It is the main reason why R is predominant than other programming languages for the development of statistical tools.

  • The array of packages

    R has a rich set of packages. R has over 10,000 packages in the CRAN repository which are constantly growing. R provides packages for data science and machine learning operations.