Category: Examples

https://cdn-icons-png.flaticon.com/512/5307/5307812.png

  • Time Series Analysis

    Let’s analyze a simple time series dataset using the forecast package.

    Step 1: Install and Load Forecast

    If you don’t have forecast installed, you can install it:

    rCopy codeinstall.packages("forecast")
    

    Then, load the library:

    rCopy codelibrary(forecast)
    

    Step 2: Create a Time Series Dataset

    We’ll create a sample time series dataset:

    rCopy code# Generate a time series dataset
    set.seed(101)
    ts_data <- ts(rnorm(120, mean = 10, sd = 2), frequency = 12, start = c(2020, 1))
    plot(ts_data, main = "Sample Time Series Data", ylab = "Value", xlab = "Time")
    

    Step 3: Decompose the Time Series

    rCopy code# Decompose the time series
    decomposed <- decompose(ts_data)
    plot(decomposed)
    

    Step 4: Forecasting

    rCopy code# Fit an ARIMA model and forecast
    fit <- auto.arima(ts_data)
    forecasted_values <- forecast(fit, h = 12)
    
    # Plot the forecast
    plot(forecasted_values, main = "Forecast for Next 12 Months")
  • Machine Learning with Caret

    We’ll demonstrate a basic machine learning workflow using the caret package for building a predictive model.

    Step 1: Install and Load Caret

    If you don’t have caret installed, you can do so with:

    rCopy codeinstall.packages("caret")
    

    Then, load the library:

    rCopy codelibrary(caret)
    

    Step 2: Create a Sample Dataset

    We’ll use the same dataset but add a binary outcome variable to predict:

    rCopy code# Adding a binary outcome variable
    set.seed(789)
    data$outcome <- ifelse(data$weight > 70, "Heavy", "Light")
    

    Step 3: Split the Dataset

    Split the data into training and testing sets:

    rCopy code# Set seed for reproducibility
    set.seed(123)
    
    # Create a training index
    train_index <- createDataPartition(data$outcome, p = 0.7, list = FALSE)
    
    # Split data into training and testing sets
    train_data <- data[train_index, ]
    test_data <- data[-train_index, ]
    

    Step 4: Train a Model

    We’ll train a simple logistic regression model:

    rCopy code# Train a logistic regression model
    model <- train(outcome ~ height + weight, data = train_data, method = "glm", family = "binomial")
    
    # Print the model summary
    summary(model)
    

    Step 5: Make Predictions

    Use the model to make predictions on the test set:

    rCopy code# Make predictions on the test set
    predictions <- predict(model, newdata = test_data)
    
    # Confusion matrix to evaluate performance
    confusionMatrix(predictions, test_data$outcome)
  • Hypothesis Testing

    We can perform a t-test to compare the means of weight between two age groups.

    Step 1: Create Age Groups

    rCopy code# Create a new variable to classify age groups
    data$age_group <- ifelse(data$age > 30, "Above 30", "30 or Below")
    

    Step 2: Conduct a t-test

    rCopy code# Perform a t-test to compare weights between the two age groups
    t_test_result <- t.test(weight ~ age_group, data = data)
    
    # Display the results
    print(t_test_result)
  • Advanced Visualization with ggplot2

    Let’s create more complex visualizations, such as a boxplot and a density plot.

    Boxplot

    rCopy code# Boxplot to visualize the distribution of weight by age group
    ggplot(data, aes(x = factor(age > 30), y = weight)) +
      geom_boxplot(fill = "lightblue") +
      labs(title = "Boxplot of Weight by Age Group (Above/Below 30)",
    
       x = "Age &gt; 30",
       y = "Weight (kg)") +
    theme_minimal()

    Density Plot

    rCopy code# Density plot for height
    ggplot(data, aes(x = height, fill = ..count..)) +
      geom_density(alpha = 0.5) +
      labs(title = "Density Plot of Height",
    
       x = "Height (cm)",
       y = "Density") +
    theme_minimal()
  • Data Manipulation with dplyr

    In this example, we’ll use the dplyr package for data manipulation. We’ll filter, summarize, and arrange data.

    Step 1: Install and Load dplyr

    If you don’t have dplyr installed yet, you can install it with:

    rCopy codeinstall.packages("dplyr")
    

    Then, load the library:

    rCopy codelibrary(dplyr)
    

    Step 2: Create a Sample Dataset

    We’ll continue using the previous dataset or create a new one:

    rCopy code# Create a sample dataset
    set.seed(456)
    data <- data.frame(
      id = 1:100,
      age = sample(18:65, 100, replace = TRUE),
      height = rnorm(100, mean = 170, sd = 10),
      weight = rnorm(100, mean = 70, sd = 15)
    )
    

    Step 3: Data Manipulation

    1. Filtering Data: Let’s filter individuals who are above 30 years old.
    rCopy code# Filter data for individuals older than 30
    filtered_data <- data %>% filter(age > 30)
    head(filtered_data)
    
    1. Summarizing Data: We can calculate the average height and weight for this filtered group.
    rCopy code# Summarize to get mean height and weight for individuals older than 30
    summary_stats <- filtered_data %>%
      summarize(
    
    mean_height = mean(height),
    mean_weight = mean(weight),
    count = n()
    ) print(summary_stats)
    1. Arranging Data: Sort the dataset by height in descending order.
    rCopy code# Arrange data by height in descending order
    arranged_data <- data %>% arrange(desc(height))
    head(arranged_data)
  • Statistical Analysis

    We can perform a linear regression analysis to understand the relationship between height and weight.

    rCopy code# Linear regression model
    model <- lm(weight ~ height, data = data)
    
    # Display the model summary
    summary(model)
  • Data Visualization

    Using the ggplot2 package, we can create a scatter plot to visualize the relationship between height and weight.

    rCopy code# Load ggplot2 package
    library(ggplot2)
    
    # Create a scatter plot of height vs weight
    ggplot(data, aes(x = height, y = weight)) +
      geom_point(color = 'blue') +
      labs(title = "Scatter Plot of Height vs Weight",
    
       x = "Height (cm)",
       y = "Weight (kg)") +
    theme_minimal()
  • Basic Data Summary

    rCopy code# Summary statistics
    summary(data)
    
    # Calculate mean height and weight
    mean_height <- mean(data$height)
    mean_weight <- mean(data$weight)
    
    cat("Mean Height:", mean_height, "\n")
    cat("Mean Weight:", mean_weight, "\n")
  • Analyzing and Visualizing a Dataset

    Step 1: Create a Dataset

    rCopy code# Create a sample dataset
    set.seed(123)  # For reproducibility
    data <- data.frame(
      id = 1:100,
      age = sample(18:65, 100, replace = TRUE),
      height = rnorm(100, mean = 170, sd = 10),
      weight = rnorm(100, mean = 70, sd = 15)
    )
    
    # View the first few rows of the dataset
    head(data)