Author: saqibkhan

  • What is R Markdown?

    R Markdown is a free and open-source R package that provides an authoring framework for building data science projects. Using it, we can write a single .rmd file that combines narrative, code, and data plots, and then render this file in a selected output format. The main characteristics of R Markdown are:

    • The resultant documents are shareable, fully reproducible, and of publication quality.
    • A wide range of static and dynamic outputs and formats, such as HTML, PDF, Microsoft Word, interactive documents, dashboards, reports, articles, books, presentations, applications, websites, reusable templates, etc.
    • Easy version control tracking.
    • Multiple programming languages are supported, including R, Python, and SQL.
  • What is RStudio?

    RStudio is an open-source IDE (integrated development environment) that is widely used as a graphical front-end for working with the R programming language starting from version 3.0.1. It has many helpful features that make it very popular among R users:

    • User-friendly
    • Flexible
    • Multifunctional
    • Allows creating reusable scripts
    • Tracks operational history
    • Autocompletes the code
    • Offers detailed and comprehensive help on any object
    • Provides easy access to all imported data and built objects
    • Makes it easy to switch between terminal and console
    • Allows plot previewing
    • Supports efficient project creation and sharing
    • Can be used with other programming languages (Python, SQL, etc.)

    To learn more about what RStudio is and how to install it and begin using it, you can follow the RStudio Tutorial.

  • What is a factor in R?

    A factor in R is a specific data type that accepts categories (aka levels) from a predefined set of possible values. These categories look like characters, but under the hood, they are stored as integers. Often, such categories have an intrinsic order. For example, a column in a data frame that contains the options of the Likert scale for assessing views (“strongly agree,” “agree,” “somewhat agree,” “neither agree nor disagree,” “somewhat disagree,” “disagree,” “strongly disagree”) should be of factor type to capture this intrinsic order and adequately reflect it on the categorical types of plots.

  • How to remove columns from a data frame in R?

    1. By using the select() function of the dplyr package of the tidyverse collection. The name of each column to delete is passed in with a minus sign before it:

    df <- select(df, -col_1, -col_3)Powered By 

    If, instead, we have too many columns to delete, it makes more sense to keep the rest of the columns rather than delete the columns in interest. In this case, the syntax is similar, but the names of the columns to keep aren’t preceded with a minus sign:

    df <- select(df, col_2, col_4)Powered By 

    2. By using the built-in subset() function of the base R. If we need to delete only one column, we assign to the select parameter of the function the column name preceded with a minus sign. To delete more than one column, we assign to this parameter a vector containing the necessary column names preceded with a minus sign:

    df <- subset(df, select=-col_1)
    df <- subset(df, select=-c(col_1, col_3))Powered By 

    If, instead, we have too many columns to delete, it makes more sense to keep the rest of the columns rather than delete the columns in interest. In this case, the syntax is similar, but no minus sign is added:

    df <- subset(df, select=col_2)
    df <- subset(df, select=c(col_2, col_4))
  • Rich Ecosystem

    R has a vast ecosystem of packages for various purposes, including data manipulation (dplyr), visualization (ggplot2), and machine learning (caret, tidymodels).

  • How do you add a new column to a data frame in R?

    1. Using the $ symbol:
    df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))
    print(df)
    ​
    df$col_3 <- c(5, 1, 18, 16)
    print(df)Powered By 

    Output:

      col_1 col_2
    1    10     a
    2    11     b
    3    12     c
    4    13     d
      col_1 col_2 col_3
    1    10     a     5
    2    11     b     1
    3    12     c    18
    4    13     d    16Powered By 
    1. Using square brackets:
    df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))
    print(df)
    ​
    df["col_3"] <- c(5, 1, 18, 16)
    print(df)Powered By 

    Output:

      col_1 col_2
    1    10     a
    2    11     b
    3    12     c
    4    13     d
      col_1 col_2 col_3
    1    10     a     5
    2    11     b     1
    3    12     c    18
    4    13     d    16Powered By 
    1. Using the cbind() function:
    df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))
    print(df)
    ​
    df <- cbind(df, col_3=c(5, 1, 18, 16))
    print(df)Powered By 

    Output:

     col_1 col_2
    1    10     a
    2    11     b
    3    12     c
    4    13     d
      col_1 col_2 col_3
    1    10     a     5
    2    11     b     1
    3    12     c    18
    4    13     d    16Powered By 

    In each of the three cases, we can assign a single value or a vector or calculate the new column based on the existing columns of that data frame or other data frames.

  • CRAN

    The Comprehensive R Archive Network (CRAN) hosts thousands of R packages, making it one of the largest repositories for statistical software. It serves as a crucial resource for users to find and install additional functionality.

  • Open Source

    R is free and open-source software, which means anyone can use, modify, and distribute it. This has led to a vibrant community contributing to its development.

  • Origin of the Name

    The name “R” comes from the first letters of the names of its creators, Ross Ihaka and Robert Gentleman. It also plays on the earlier S programming language.

  • How to create a data frame in R?

    1. From one or more vectors of the same length—by using the data.frame() function:

    df <- data.frame(vector_1, vector_2)Powered By 

    2. From a matrix—by using the data.frame() function:

    df <- data.frame(my_matrix)Powered By 

    3. From a list of vectors of the same length—by using the data.frame() function:

    df <- data.frame(list_of_vectors)Powered By 

    4. From other data frames:

    • To combine the data frames horizontally (only if the data frames have the same number of rows, and the records are the same and in the same order) —by using the cbind() function:
    df <- cbind(df1, df2)Powered By 
    • To combine the data frames vertically (only if they have an equal number of identically named columns of the same data type and appearing in the same order) —by using the rbind() function:
    df <- rbind(df1, df2)