Category: Interview Questions

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSmU5XsFIGN1SKqOwOVoJrdANk8J2vp87lNuA&s

  • How to parse a date from its string representation in R?

    To parse a date from its string representation in R, we should use the lubridate package of the tidyverse collection. This package offers various functions for parsing a string and extracting the standard date from it based on the initial date pattern in that string. These functions are ymd()ymd_hm()ymd_hms()dmy()dmy_hm()dmy_hms()mdy()mdy_hm()mdy_hms(), etc., where y, m, d, h, m, and s correspond to year, month, day, hours, minutes, and seconds, respectively.

    For example, if we run the dmy() function passing to it any of the strings “05-11-2023”, “05/11/2023” or “05.11.2023”, representing the same date, we’ll receive the same result: 2023-11-05. This is because in all three cases, despite having different dividing symbols, we actually have the same pattern: the day followed by the month followed by the year.

  • How to create a new column in a data frame in R based on other columns?

    1. Using the transform() and ifelse() functions of the base R:

    df <- data.frame(col_1 = c(1, 3, 5, 7),  col_2 = c(8, 6, 4, 2))
    print(df)
    ​
    # Adding the column col_3 to the data frame df
    df <- transform(df, col_3 = ifelse(col_1 < col_2, col_1 + col_2, col_1 * col_2))
    print(df)Powered By 

    Output:

      col_1 col_2
    1     1     8
    2     3     6
    3     5     4
    4     7     2
      col_1 col_2 col_3
    1     1     8     9
    2     3     6     9
    3     5     4    20
    4     7     2    14Powered By 

    2. Using the with() and ifelse() functions of the base R:

    df <- data.frame(col_1 = c(1, 3, 5, 7),  col_2 = c(8, 6, 4, 2))
    print(df)
    ​
    # Adding the column col_3 to the data frame df
    df["col_3"] <- with(df, ifelse(col_1 < col_2, col_1 + col_2, col_1 * col_2))
    print(df)Powered By 

    Output:

      col_1 col_2
    1     1     8
    2     3     6
    3     5     4
    4     7     2
      col_1 col_2 col_3
    1     1     8     9
    2     3     6     9
    3     5     4    20
    4     7     2    14Powered By 

    3. Using the apply() function of the base R:

    df <- data.frame(col_1 = c(1, 3, 5, 7),  col_2 = c(8, 6, 4, 2))
    print(df)
    ​
    # Adding the column col_3 to the data frame df
    df["col_3"] <- apply(df, 1, FUN = function(x) if(x[1] < x[2]) x[1] + x[2] else x[1] * x[2])
    print(df) Powered By 

    Output:

      col_1 col_2
    1     1     8
    2     3     6
    3     5     4
    4     7     2
      col_1 col_2 col_3
    1     1     8     9
    2     3     6     9
    3     5     4    20
    4     7     2    14Powered By 

    4. Using the mutate() function of the dplyr package and the ifelse() function of the base R:

    df <- data.frame(col_1 = c(1, 3, 5, 7),  col_2 = c(8, 6, 4, 2))
    print(df)
    ​
    # Adding the column col_3 to the data frame df
    df <- mutate(df, col_3 = ifelse(col_1 < col_2, col_1 + col_2, col_1 * col_2))
    print(df)Powered By 

    Output:

      col_1 col_2
    1     1     8
    2     3     6
    3     5     4
    4     7     2
      col_1 col_2 col_3
    1     1     8     9
    2     3     6     9
    3     5     4    20
    4     7     2    14
  • What is the difference between the subset() and sample() functions n R?

    The subset() function in R is used for extracting rows and columns from a data frame or a matrix, or elements from a vector, based on certain conditions, e.g.: subset(my_vector, my_vector > 10).

    Instead, the sample() function in R can be applied only to vectors. It extracts a random sample of the predefined size from the elements of a vector, with or without replacement. For example, sample(my_vector, size=5, replace=TRUE)

  • What is the difference between the str() and summary() functions in R?

    The str() function returns the structure of an R object and the overall information about it, the exact contents of which depend on the data structure of that object. For example, for a vector, it returns the data type of its items, the range of item indices, and the item values (or several first values, if the vector is too long). For a data frame, it returns its class (data.frame), the number of observations and variables, the column names, the data type of each column, and several first values of each column.

    The summary() function returns the summary statistics for an R object. It’s mostly applied to data frames and matrices, for which it returns the minimum, maximum, mean, and median values, and the 1st and 3rd quartiles for each numeric column, while for the factor columns, it returns the count of each level.

  • What is the use of the next and break statements in R?

    The next statement is used to skip a particular iteration and jump to the next one if a certain condition is met. The break statement is used to stop and exit the loop at a particular iteration if a certain condition is met. When used in one of the inner loops of a nested loop, this statement exits only that inner loop.

    Both next and break statements can be used in any type of loops in R: for loops, while loops, and repeat loops. They can also be used in the same loop, e.g.:

    for(i in 1:10) {
    
    if(i &lt; 5)
        next
    if(i == 8)
        break
    print(i)}</code>Powered By </code></pre>

    Output:

    [1] 5
    [1] 6
    [1] 7
  • What is vector recycling in R?

    If we try to perform some operation on two R vectors with different lengths, the R interpreter detects under the hood the shorter one, recycles its items in the same order until the lengths of the two vectors match, and only then performs the necessary operation on these vectors. Before starting vector recycling, though, the R interpreter throws a warning message about the initial mismatch of the vectors’ lengths.

    For example, if we try to run the following addition:

    c(1, 2, 3, 4, 5) + c(1, 2, 3)Powered By 

    The second vector, due to the vector recycling, will actually be converted into c(1, 2, 3, 1, 2). Hence, the final result of this operation will be c(2, 4, 6, 5, 7).

    While sometimes vector recycling can be beneficial (e.g., when we expect the cyclicity of values in the vectors), more often, it’s inappropriate and misleading. Hence, we should be careful and mind the vectors’ lengths before performing operations on them.

  • What types of data plots can be created in R?

    Being data visualization one of the strong sides of the R programming languages, we can create all types of data plots in R:

    • Common types of data plots:
      • Bar plot—shows the numerical values of categorical data.
      • Line plot—shows a progression of a variable, usually over time.
      • Scatter plot—shows the relationships between two variables.
      • Area plot—based on a line plot, with the area below the line colored or filled with a pattern.
      • Pie chart—shows the proportion of each category of categorical data as a part of the whole.
      • Box plot—shows a set of descriptive statistics of the data.
    • Advanced types of data plots:
      • Violin plot—shows both a set of descriptive statistics of the data and the distribution shape for that data.
      • Heatmap—shows the magnitude of each numeric data point within the dataset.
      • Treemap—shows the numerical values of categorical data, often as a part of the whole.
      • Dendrogram—shows an inner hierarchy and clustering of the data.
      • Bubble plot—shows the relationships between three variables.
      • Hexbin plot—shows the relationships of two numerical variables in a relatively large dataset.
      • Word cloud—shows the frequency of words in an input text.
      • Choropleth map—shows aggregate thematic statistics of geodata.
      • Circular packing chart—shows an inner hierarchy of the data and the values of the data points
      • etc.

    The skill track Data Visualization with R will help you broaden your horizons in the field of R graphics. If you prefer to learn data visualization in R in a broader context, explore a thorough and beginner-friendly career track Data Scientist with R.

  • How to chain several operations together in R?

    We can chain several operations in R by using the pipe operator (%>%) provided by the tidyverse collection. Using this operator allows creating a pipeline of functions where the output of the first function is passed as the input into the second function and so on, until the pipeline ends. This eliminates the need for creating additional variables and significantly enhances the overall code readability.

    An example of using the pipe operator on a data frame:

    df <- data.frame(a=1:4, b=11:14, c=21:24)
    print(df)
    ​
    df_new <- df %>% select(a, b) %>% filter(a > 2)
    print(df_new)Powered By 

    Output:

     a  b  c
    1 1 11 21
    2 2 12 22
    3 3 13 23
    4 4 14 24
      a  b
    1 3 12
    2 4 13
  • How to transpose two-dimensional data in R?

    We can transpose a data frame or a matrix in R so that the columns become the rows and vice versa. For this purpose, we need to use the t() function of the base R. For example:

    df <- data.frame(col_1=c(10, 20, 30), col_2=c(11, 22, 33))
    print(df)
    ​
    transposed_df <- t(df)
    print(transposed_df)Powered By 

    Output:

     col_1 col_2
    1    10    11
    2    20    22
    3    30    33
    
      &#91;,1] &#91;,2] &#91;,3]
    col_1 10 20 30 col_2 11 22 33
  • How to concatenate strings in R?

    We can concatenate two or more strings in R by using the paste() or cat() functions. The first approach is more popular. Both functions take in any number of strings to be concatenated and can also take in an optional parameter sep (along with some other optional parameters)—a character or a sequence of characters that will separate attached strings in the resulting string (a white space by default).