Category: Interview Questions

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSmU5XsFIGN1SKqOwOVoJrdANk8J2vp87lNuA&s

  • How to merge data in R?

    1. Using the cbind() function—only if the data frames have the same number of rows, and the records are the same and in the same order:

    df <- cbind(df1, df2)Powered By 

    2. Using the rbind() function to combine the data frames vertically—only if they have an equal number of identically named columns of the same data type and appearing in the same order:

    df <- rbind(df1, df2)Powered By 

    3. Using the merge() function to merge data frames by a column in common, usually an ID column:

    • Inner join:
    df <- merge(df1, df2, by="ID")Powered By 
    • Left join:
    df <- merge(df1, df2, by="ID", all.x=TRUE)Powered By 
    • Right join:
    df <- merge(df1, df2, by="ID", all.y=TRUE)Powered By 
    • Outer join:
    df <- merge(df1, df2, by="ID", all=TRUE)Powered By 

    4. Using the join() function of the dplyr package to merge data frames by a column in common, usually an ID column:

    df <- join(df1, df2, by="ID", type="type_of_join")Powered By 

    The type parameter takes in one of the following values: innerleftright, or full.

  • How to aggregate data in R?

    To aggregate data in R, we use the aggregate() function. This function has the following essential parameters, in this order:

    • x—the data frame to aggregate.
    • by—a list of the factors to group by.
    • FUN—an aggregate function to compute the summary statistics for each group (e.g., meanmaxmincountsum).
  • What types of loops exist in R, and what is the syntax of each type?

    1. For loop—iterates over a sequence the number of times equal to its length (unless the statements break and/or next are used) and performs the same set of operations on each item of that sequence. This is the most common type of loops. The syntax of a for loop in R is the following:

    for (variable in sequence) {
    
    operations
    }
    Powered By

    2. While loop—performs the same set of operations until a predefined logical condition (or several logical conditions) is met—unless the statements break and/or next are used. Unlike for loops, we don’t know in advance the number of iterations a while loop is going to execute. Before running a while loop, we need to assign a variable (or several variables) and then update its value inside the loop body at each iteration. The syntax of a while loop in R is the following:

    variable assignment
    
    while (logical condition) {
    
    operations
    variable update
    }
    Powered By

    3. Repeat loop—repeatedly performs the same set of operations until a predefined break condition (or several break conditions) is met. To introduce such a condition, a repeat loop has to contain an if-statement code block, which, in turn, has to include the break statement in its body. Like while loops, we don’t know in advance the number of iterations a repeat loop is going to execute. The syntax of a repeat loop in R is the following:

    repeat { 
    
    operations 
    if(break condition) {
        break
    }
    }
  • What are the requirements for naming variables in R?

    • A variable name can be a combination of letters, digits, dots, and underscores. It can’t contain any other symbols, including white spaces.
    • A variable name must start with a letter or a dot.
    • If a variable name starts with a dot, this dot can’t be followed by a digit.
    • Reserved words in R (TRUEforNULL, etc.) can’t be used as variable names.
    • Variable names are case-sensitive.
  • How to assign a value to a variable in R?

    1. Using the assignment operator <-, e.g., my_var <- 1—the most common way of assigning a value to a variable in R.
    2. Using the equal operator =, e.g., my_var = 1—for assigning values to arguments inside a function definition.
    3. Using the rightward assignment operator ->, e.g., my_var -> 1—can be used in pipes.
    4. Using the global assignment operators, either leftward (<<-) or rightward (->>), e.g., my_var <<- 1—for creating a global variable inside a function definition.
  • List some popular data visualization packages in R.

    • ggplot2—the most popular R data visualization package allowing the creation of a wide variety of plots.
    • Lattice—for displaying multivariate data as a tiled panel (trellis) of several plots.
    • Plotly—for creating interactive, publication-quality charts.
    • highcharter—for easy dynamic plotting, offers many flexible features, plugins, and themes; allows charting different R objects with one function.
    • Leaflet—for creating interactive maps.
    • ggvis—for creating interactive and highly customizable plots that can be accessed in any browser by using Shiny’s infrastructure.
    • patchwork—for combining several plots, usually of various types, on the same graphic.
  • How to create a user-defined function in R?

    To create a user-defined function in R, we use the keyword function and the following syntax:

    function_name <- function(parameters){
    
    function body 
    }
    Powered By
    1. Function name—the name of the function object that will be used for calling the function after its definition.
    2. Function parameters—the variables separated with a comma and placed inside the parentheses that will be set to actual argument values each time we call the function.
    3. Function body—a chunk of code in the curly brackets containing the operations to be performed in a predefined order on the input arguments each time we call the function. Usually, the function body contains the return() statement (or statements) that returns the function output, or the print() statement (or statements) to print the output.

    An example of a simple user-defined function in R:

    my_function <- function(x, y){
    
    return(x + y)
    }
  • What is R Markdown?

    R Markdown is a free and open-source R package that provides an authoring framework for building data science projects. Using it, we can write a single .rmd file that combines narrative, code, and data plots, and then render this file in a selected output format. The main characteristics of R Markdown are:

    • The resultant documents are shareable, fully reproducible, and of publication quality.
    • A wide range of static and dynamic outputs and formats, such as HTML, PDF, Microsoft Word, interactive documents, dashboards, reports, articles, books, presentations, applications, websites, reusable templates, etc.
    • Easy version control tracking.
    • Multiple programming languages are supported, including R, Python, and SQL.
  • What is RStudio?

    RStudio is an open-source IDE (integrated development environment) that is widely used as a graphical front-end for working with the R programming language starting from version 3.0.1. It has many helpful features that make it very popular among R users:

    • User-friendly
    • Flexible
    • Multifunctional
    • Allows creating reusable scripts
    • Tracks operational history
    • Autocompletes the code
    • Offers detailed and comprehensive help on any object
    • Provides easy access to all imported data and built objects
    • Makes it easy to switch between terminal and console
    • Allows plot previewing
    • Supports efficient project creation and sharing
    • Can be used with other programming languages (Python, SQL, etc.)

    To learn more about what RStudio is and how to install it and begin using it, you can follow the RStudio Tutorial.

  • What is a factor in R?

    A factor in R is a specific data type that accepts categories (aka levels) from a predefined set of possible values. These categories look like characters, but under the hood, they are stored as integers. Often, such categories have an intrinsic order. For example, a column in a data frame that contains the options of the Likert scale for assessing views (“strongly agree,” “agree,” “somewhat agree,” “neither agree nor disagree,” “somewhat disagree,” “disagree,” “strongly disagree”) should be of factor type to capture this intrinsic order and adequately reflect it on the categorical types of plots.