Category: Examples

https://cdn-icons-png.flaticon.com/512/5307/5307812.png

Functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions. In R, you can use functions like lapply, sapply, and map from the purrr package.

Step 1: Install and Load purrr

rCopy codeinstall.packages("purrr")
library(purrr)

Step 2: Create a Sample List

rCopy code# Create a list of numeric vectors
num_list <- list(a = 1:5, b = 6:10, c = 11:15)

Step 3: Use lapply and sapply

rCopy code# Apply a function to each element using lapply
squared_list <- lapply(num_list, function(x) x^2)
print(squared_list)

# Use sapply to simplify the output to a matrix
squared_matrix <- sapply(num_list, function(x) x^2)
print(squared_matrix)

Step 4: Use map from purrr

rCopy code# Use map to apply a function and return a list
squared_map <- map(num_list, ~ .x^2)
print(squared_map)

October 30, 2024

Integration with Databases using DBI and RMySQL

R can connect to databases to perform data analysis on large datasets. Here’s how to connect to a MySQL database.

Step 1: Install and Load Required Packages

rCopy codeinstall.packages("DBI")
install.packages("RMySQL")
library(DBI)
library(RMySQL)

Step 2: Connect to the Database

rCopy code# Connect to the MySQL database
con <- dbConnect(RMySQL::MySQL(), 
             dbname = "your_database_name",
             host = "your_host",
             user = "your_username",
             password = "your_password")

Step 3: Query Data

rCopy code# Query data from a table
data_db <- dbGetQuery(con, "SELECT * FROM your_table_name LIMIT 100")

# View the queried data
head(data_db)

Step 4: Disconnect from the Database

rCopy code# Disconnect from the database
dbDisconnect(con)

October 30, 2024

Geographic Data Analysis with sf and ggplot2

Geospatial data analysis is crucial for visualizing and analyzing spatial relationships. We’ll use the sf package for handling spatial data.

Step 1: Install and Load sf

rCopy codeinstall.packages("sf")
install.packages("ggplot2")  # Make sure ggplot2 is installed
library(sf)
library(ggplot2)

Step 2: Load Geographic Data

For this example, you can use built-in datasets or download shapefiles. Here, we’ll use a simple example with the nc dataset from the sf package.

rCopy code# Load the North Carolina shapefile (included in the sf package)
nc <- st_read(system.file("shape/nc.shp", package = "sf"))

# Plot the geographic data
ggplot(data = nc) +
  geom_sf() +
  labs(title = "North Carolina Counties",
   x = "Longitude", y = "Latitude") +  theme_minimal()

Step 3: Analyze and Visualize Attributes

rCopy code# Calculate the area of each county and add it as a new column
nc$area <- st_area(nc)

# Plot with area as fill
ggplot(data = nc) +
  geom_sf(aes(fill = area)) +
  labs(title = "Area of North Carolina Counties",
   fill = "Area (sq meters)") +
  theme_minimal()

October 30, 2024

Advanced Statistical Modeling with Mixed-Effects Models

Mixed-effects models are useful when dealing with data that have both fixed and random effects. We’ll use the lme4 package for this.

Step 1: Install and Load lme4

rCopy codeinstall.packages("lme4")
library(lme4)

Step 2: Create a Sample Dataset

rCopy code# Create a sample dataset with random effects
set.seed(222)
data_mixed <- data.frame(
  id = rep(1:10, each = 10),
  x = rnorm(100),
  y = rnorm(100)
)

# Introduce a random effect
data_mixed$y <- data_mixed$y + rep(rnorm(10, mean = 5, sd = 1), each = 10)

Step 3: Fit a Mixed-Effects Model

rCopy code# Fit a mixed-effects model
model_mixed <- lmer(y ~ x + (1 | id), data = data_mixed)

# Display the model summary
summary(model_mixed)

October 30, 2024

Network Analysis with igraph

Network analysis is essential for understanding relationships in data. We’ll use the igraph package.

Step 1: Install and Load igraph

rCopy codeinstall.packages("igraph")
library(igraph)

Step 2: Create a Sample Graph

rCopy code# Create a sample graph
edges <- data.frame(
  from = c("A", "A", "B", "C", "C", "D", "E"),
  to = c("B", "C", "D", "D", "E", "E", "A")
)

# Create a graph object
graph <- graph_from_data_frame(edges, directed = TRUE)

# Plot the graph
plot(graph, vertex.color = "lightblue", vertex.size = 30, edge.arrow.size = 0.5,
 main = "Sample Directed Graph")

Step 3: Analyze the Graph

rCopy code# Calculate degree centrality
degree_centrality <- degree(graph)
print(degree_centrality)

# Identify the largest connected component
largest_component <- components(graph)$membership
print(largest_component)

October 30, 2024

Text Analysis with tm and wordcloud

Text analysis is vital for extracting insights from unstructured data. Here, we’ll analyze a simple text corpus.

Step 1: Install and Load Required Packages

rCopy codeinstall.packages("tm")
install.packages("wordcloud")
library(tm)
library(wordcloud)

Step 2: Create a Sample Text Corpus

rCopy code# Create a sample text corpus
texts <- c("R is great for data analysis.",
       "Data science is an exciting field.",
       "R and Python are popular programming languages.",
       "Data visualization is key to understanding data.")
# Create a Corpus
corpus <- Corpus(VectorSource(texts))

# Preprocess the text (convert to lower case, remove punctuation)
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, stopwords("en"))

Step 3: Create a Term-Document Matrix

rCopy code# Create a term-document matrix
tdm <- TermDocumentMatrix(corpus)
tdm_matrix <- as.matrix(tdm)
word_freqs <- sort(rowSums(tdm_matrix), decreasing = TRUE)
word_freqs_df <- data.frame(word = names(word_freqs), freq = word_freqs)

Step 4: Generate a Word Cloud

rCopy code# Create a word cloud
set.seed(1234)
wordcloud(words = word_freqs_df$word, freq = word_freqs_df$freq, min.freq = 1,
      max.words = 100, random.order = FALSE, rot.per = 0.35,
      colors = brewer.pal(8, "Dark2"))</code></code></pre>


			October 30, 2024


		
		
			
			Clustering with k-means
			
Clustering is a powerful technique for grouping similar data points. We’ll use the k-means algorithm.



Step 1: Create a Sample Dataset



rCopy code# Generate a sample dataset
set.seed(111)
cluster_data <- data.frame(
  x = rnorm(100),
  y = rnorm(100)
)

# Visualize the data
plot(cluster_data$x, cluster_data$y, main = "Sample Data for Clustering", xlab = "X", ylab = "Y")




Step 2: Apply k-means Clustering



rCopy code# Apply k-means clustering
kmeans_result <- kmeans(cluster_data, centers = 3)

# Add the cluster assignments to the dataset
cluster_data$cluster <- as.factor(kmeans_result$cluster)

# Plot the clusters
library(ggplot2)
ggplot(cluster_data, aes(x = x, y = y, color = cluster)) +
  geom_point() +
  labs(title = "K-means Clustering Result") +
  theme_minimal()

			October 30, 2024


		
		
			
			Clustering with k-means
			
Clustering is a powerful technique for grouping similar data points. We’ll use the k-means algorithm.



Step 1: Create a Sample Dataset



rCopy code# Generate a sample dataset
set.seed(111)
cluster_data <- data.frame(
  x = rnorm(100),
  y = rnorm(100)
)

# Visualize the data
plot(cluster_data$x, cluster_data$y, main = "Sample Data for Clustering", xlab = "X", ylab = "Y")




Step 2: Apply k-means Clustering



rCopy code# Apply k-means clustering
kmeans_result <- kmeans(cluster_data, centers = 3)

# Add the cluster assignments to the dataset
cluster_data$cluster <- as.factor(kmeans_result$cluster)

# Plot the clusters
library(ggplot2)
ggplot(cluster_data, aes(x = x, y = y, color = cluster)) +
  geom_point() +
  labs(title = "K-means Clustering Result") +
  theme_minimal()

			October 30, 2024


		
		
			
			Complex Visualization with ggplot2
			
We can create faceted plots and combine multiple visualizations.



Faceted Plot



rCopy code# Create faceted plots by age group
ggplot(data, aes(x = height, y = weight)) +
  geom_point(aes(color = age_group), alpha = 0.7) +
  facet_wrap(~ age_group) +
  labs(title = "Height vs Weight by Age Group",
   x = "Height (cm)",
   y = "Weight (kg)") +  theme_minimal()




Combining Plots



You can also use the patchwork package to combine multiple plots:



rCopy code# Install and load patchwork
install.packages("patchwork")
library(patchwork)

# Scatter plot and boxplot
scatter_plot <- ggplot(data, aes(x = height, y = weight)) + geom_point() + theme_minimal()
box_plot <- ggplot(data, aes(x = age_group, y = weight)) + geom_boxplot() + theme_minimal()

# Combine plots
combined_plot <- scatter_plot + box_plot + plot_layout(ncol = 2)
print(combined_plot)

			October 30, 2024

Category: Examples

Step 1: Set Parameters

Step 2: Simulate Sample Means

Step 3: Plot the Distribution of Sample Means

Step 1: Install and Load purrr

Step 2: Create a Sample List

Step 3: Use lapply and sapply

Step 4: Use map from purrr

Step 1: Install and Load Required Packages

Step 2: Connect to the Database

Step 3: Query Data

Step 4: Disconnect from the Database

Step 1: Install and Load sf

Step 2: Load Geographic Data

Step 3: Analyze and Visualize Attributes

Step 1: Install and Load lme4

Step 2: Create a Sample Dataset

Step 3: Fit a Mixed-Effects Model

Step 1: Install and Load igraph

Step 2: Create a Sample Graph

Step 3: Analyze the Graph

Step 1: Install and Load Required Packages

Step 2: Create a Sample Text Corpus

Step 3: Create a Term-Document Matrix

Step 4: Generate a Word Cloud

Step 1: Create a Sample Dataset

Step 2: Apply k-means Clustering

Step 1: Create a Sample Dataset

Step 2: Apply k-means Clustering

Faceted Plot

Combining Plots