Category: Examples

https://cdn-icons-png.flaticon.com/512/5307/5307812.png

Time Series Analysis

Let’s analyze a simple time series dataset using the forecast package.

Step 1: Install and Load Forecast

If you don’t have forecast installed, you can install it:

rCopy codeinstall.packages("forecast")

Then, load the library:

rCopy codelibrary(forecast)

Step 2: Create a Time Series Dataset

We’ll create a sample time series dataset:

rCopy code# Generate a time series dataset
set.seed(101)
ts_data <- ts(rnorm(120, mean = 10, sd = 2), frequency = 12, start = c(2020, 1))
plot(ts_data, main = "Sample Time Series Data", ylab = "Value", xlab = "Time")

Step 3: Decompose the Time Series

rCopy code# Decompose the time series
decomposed <- decompose(ts_data)
plot(decomposed)

Step 4: Forecasting

rCopy code# Fit an ARIMA model and forecast
fit <- auto.arima(ts_data)
forecasted_values <- forecast(fit, h = 12)

# Plot the forecast
plot(forecasted_values, main = "Forecast for Next 12 Months")

October 30, 2024

Machine Learning with Caret

We’ll demonstrate a basic machine learning workflow using the caret package for building a predictive model.

Step 1: Install and Load Caret

If you don’t have caret installed, you can do so with:

rCopy codeinstall.packages("caret")

Then, load the library:

rCopy codelibrary(caret)

Step 2: Create a Sample Dataset

We’ll use the same dataset but add a binary outcome variable to predict:

rCopy code# Adding a binary outcome variable
set.seed(789)
data$outcome <- ifelse(data$weight > 70, "Heavy", "Light")

Step 3: Split the Dataset

Split the data into training and testing sets:

rCopy code# Set seed for reproducibility
set.seed(123)

# Create a training index
train_index <- createDataPartition(data$outcome, p = 0.7, list = FALSE)

# Split data into training and testing sets
train_data <- data[train_index, ]
test_data <- data[-train_index, ]

Step 4: Train a Model

We’ll train a simple logistic regression model:

rCopy code# Train a logistic regression model
model <- train(outcome ~ height + weight, data = train_data, method = "glm", family = "binomial")

# Print the model summary
summary(model)

Step 5: Make Predictions

Use the model to make predictions on the test set:

rCopy code# Make predictions on the test set
predictions <- predict(model, newdata = test_data)

# Confusion matrix to evaluate performance
confusionMatrix(predictions, test_data$outcome)

October 30, 2024

Hypothesis Testing

We can perform a t-test to compare the means of weight between two age groups.

Step 1: Create Age Groups

rCopy code# Create a new variable to classify age groups
data$age_group <- ifelse(data$age > 30, "Above 30", "30 or Below")

Step 2: Conduct a t-test

rCopy code# Perform a t-test to compare weights between the two age groups
t_test_result <- t.test(weight ~ age_group, data = data)

# Display the results
print(t_test_result)

October 30, 2024

Advanced Visualization with ggplot2

Let’s create more complex visualizations, such as a boxplot and a density plot.

Boxplot

rCopy code# Boxplot to visualize the distribution of weight by age group
ggplot(data, aes(x = factor(age > 30), y = weight)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Boxplot of Weight by Age Group (Above/Below 30)",
   x = "Age &gt; 30",
   y = "Weight (kg)") +  theme_minimal()

Density Plot

rCopy code# Density plot for height
ggplot(data, aes(x = height, fill = ..count..)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Height",
   x = "Height (cm)",
   y = "Density") +
  theme_minimal()

October 30, 2024

Data Manipulation with dplyr

In this example, we’ll use the dplyr package for data manipulation. We’ll filter, summarize, and arrange data.

Step 1: Install and Load dplyr

If you don’t have dplyr installed yet, you can install it with:

rCopy codeinstall.packages("dplyr")

Then, load the library:

rCopy codelibrary(dplyr)

Step 2: Create a Sample Dataset

We’ll continue using the previous dataset or create a new one:

rCopy code# Create a sample dataset
set.seed(456)
data <- data.frame(
  id = 1:100,
  age = sample(18:65, 100, replace = TRUE),
  height = rnorm(100, mean = 170, sd = 10),
  weight = rnorm(100, mean = 70, sd = 15)
)

Step 3: Data Manipulation

Filtering Data: Let’s filter individuals who are above 30 years old.

rCopy code# Filter data for individuals older than 30
filtered_data <- data %>% filter(age > 30)
head(filtered_data)

Summarizing Data: We can calculate the average height and weight for this filtered group.

rCopy code# Summarize to get mean height and weight for individuals older than 30
summary_stats <- filtered_data %>%
  summarize(
mean_height = mean(height),
mean_weight = mean(weight),
count = n()  )
print(summary_stats)

Arranging Data: Sort the dataset by height in descending order.

rCopy code# Arrange data by height in descending order
arranged_data <- data %>% arrange(desc(height))
head(arranged_data)

October 30, 2024

Statistical Analysis
We can perform a linear regression analysis to understand the relationship between height and weight.
```
rCopy code# Linear regression model
model <- lm(weight ~ height, data = data)

# Display the model summary
summary(model)
```
October 30, 2024

Data Visualization

Using the ggplot2 package, we can create a scatter plot to visualize the relationship between height and weight.

rCopy code# Load ggplot2 package
library(ggplot2)

# Create a scatter plot of height vs weight
ggplot(data, aes(x = height, y = weight)) +
  geom_point(color = 'blue') +
  labs(title = "Scatter Plot of Height vs Weight",
   x = "Height (cm)",
   y = "Weight (kg)") +
  theme_minimal()

October 30, 2024

Basic Data Summary

rCopy code# Summary statistics
summary(data)

# Calculate mean height and weight
mean_height <- mean(data$height)
mean_weight <- mean(data$weight)

cat("Mean Height:", mean_height, "\n")
cat("Mean Weight:", mean_weight, "\n")

October 30, 2024

Analyzing and Visualizing a Dataset

Step 1: Create a Dataset

rCopy code# Create a sample dataset
set.seed(123)  # For reproducibility
data <- data.frame(
  id = 1:100,
  age = sample(18:65, 100, replace = TRUE),
  height = rnorm(100, mean = 170, sd = 10),
  weight = rnorm(100, mean = 70, sd = 15)
)

# View the first few rows of the dataset
head(data)

October 30, 2024