Learn in 10 minutes

Learn in 10 minutes

Learn R in 10 minutes

R is a powerful programming language and environment for statistical computing, data analysis, and visualization. This tutorial covers the essential concepts to get you started with R programming.

1. Getting Started with R

Let’s start with a simple program. Open R or RStudio and enter the following code:

print("Hello, World!")

You can also run this in the R console. The output will be:

[1] "Hello, World!"

This simple program demonstrates R’s basic output functionality. The print() function displays text information in the console.

2. Basic Syntax and Variables

R’s syntax is designed for statistical computing and data analysis. Let’s explore basic variable assignment and operations.

# This is a comment
x <- 5  # Assignment using <- operator
y = 10   # Assignment using = operator (less common)
print(x + y)

Basic syntax rules in R:

  • Assignment: Use <- for variable assignment (preferred) or =
  • Comments: Single-line comments start with #
  • Functions: Use parentheses () for function calls
  • Vectors: R is vectorized - operations work on entire vectors

3. Data Types and Structures

R has several fundamental data types and structures for statistical computing.

3.1 Basic Data Types

# Numeric (double)
num <- 3.14
print(class(num))

# Integer
int <- 42L
print(class(int))

# Character (string)
text <- "Hello R"
print(class(text))

# Logical (boolean)
flag <- TRUE
print(class(flag))

3.2 Vectors

Vectors are the fundamental data structure in R. They are one-dimensional arrays that can hold numeric, character, or logical data.

# Creating vectors
numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE)

# Vector operations
print(numeric_vector * 2)  # Multiply each element by 2
print(numeric_vector + 1)  # Add 1 to each element
print(length(numeric_vector))  # Get vector length

3.3 Lists

Lists are flexible containers that can hold elements of different types.

# Creating a list
my_list <- list(
  name = "John",
  age = 30,
  scores = c(85, 92, 78),
  active = TRUE
)

# Accessing list elements
print(my_list$name)
print(my_list[["age"]])
print(my_list[[3]])

3.4 Data Frames

Data frames are the most important data structure for data analysis in R. They are like tables with rows and columns.

# Creating a data frame
df <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 35),
  score = c(88, 92, 85)
)

# View the data frame
print(df)
print(str(df))  # Structure of data frame
print(summary(df))  # Summary statistics

4. Basic Operations

R provides a rich set of operators for mathematical and logical operations.

# Arithmetic operations
a <- 10
b <- 3

print(a + b)  # Addition
print(a - b)  # Subtraction
print(a * b)  # Multiplication
print(a / b)  # Division
print(a ^ b)  # Exponentiation
print(a %% b) # Modulus
print(a %/% b) # Integer division

# Comparison operations
print(a > b)   # Greater than
print(a == b)  # Equal to
print(a != b)  # Not equal to
print(a <= b)  # Less than or equal to

# Logical operations
print(TRUE & FALSE)  # AND
print(TRUE | FALSE)  # OR
print(!TRUE)         # NOT

5. Control Structures

R provides standard control structures for program flow.

5.1 if Statements

age <- 20

if (age >= 18) {
  print("Adult")
} else if (age >= 13) {
  print("Teen")
} else {
  print("Child")
}

5.2 for Loops

# Iterating over a vector
fruits <- c("apple", "banana", "cherry")
for (fruit in fruits) {
  print(fruit)
}

# Using sequence
for (i in 1:5) {
  print(paste("Number:", i))
}

5.3 while Loops

count <- 1
while (count <= 5) {
  print(count)
  count <- count + 1
}

6. Functions

Functions in R are reusable code blocks for specific tasks.

# Basic function definition
calculate_area <- function(length, width) {
  area <- length * width
  return(area)
}

# Calling the function
result <- calculate_area(5, 3)
print(result)

# Function with default parameters
greet <- function(name = "Guest") {
  return(paste("Hello,", name))
}

print(greet("Alice"))
print(greet())  # Uses default parameter

7. Data Manipulation

R excels at data manipulation. Let’s explore some basic operations.

# Sample data frame
students <- data.frame(
  name = c("Alice", "Bob", "Charlie", "Diana"),
  math_score = c(85, 92, 78, 95),
  science_score = c(88, 90, 82, 96),
  grade = c("A", "A", "B", "A")
)

# Subsetting data
print(students[students$math_score > 85, ])  # Rows where math_score > 85
print(students[, c("name", "math_score")])   # Specific columns

# Adding new columns
students$total_score <- students$math_score + students$science_score
students$average_score <- students$total_score / 2

print(students)

8. Data Visualization

R has powerful visualization capabilities, especially with ggplot2.

# Basic plotting (base R)
# Create some sample data
x <- 1:10
y <- x^2

# Scatter plot
plot(x, y, main = "Scatter Plot", xlab = "X", ylab = "Y", col = "blue", pch = 16)

# Line plot
plot(x, y, type = "l", main = "Line Plot", xlab = "X", ylab = "Y", col = "red")

# Histogram
hist(rnorm(100), main = "Histogram", xlab = "Values", col = "lightblue")

8.1 Using ggplot2 (if installed)

# Install and load ggplot2 if not already installed
# install.packages("ggplot2")
library(ggplot2)

# Create a sample data frame
plot_data <- data.frame(
  category = c("A", "B", "C", "D"),
  value = c(25, 40, 30, 35)
)

# Create a bar plot
ggplot(plot_data, aes(x = category, y = value)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Bar Plot Example", x = "Category", y = "Value") +
  theme_minimal()

9. Statistical Analysis

R is designed for statistical computing. Here are some basic statistical functions.

# Sample data
data <- c(23, 45, 67, 34, 56, 78, 89, 12, 45, 67)

# Basic statistics
print(mean(data))     # Mean
print(median(data))   # Median
print(sd(data))       # Standard deviation
print(var(data))      # Variance
print(min(data))      # Minimum
print(max(data))      # Maximum
print(summary(data))  # Five-number summary

# Correlation
x <- 1:10
y <- x + rnorm(10)  # Add some noise
print(cor(x, y))    # Correlation coefficient

# Linear regression
model <- lm(y ~ x)
print(summary(model))

10. Working with Files

R provides functions for reading and writing files.

# Writing to a file
write.csv(students, "students.csv", row.names = FALSE)

# Reading from a file
# read_data <- read.csv("students.csv")
# print(read_data)

# Working with text files
# writeLines(c("Line 1", "Line 2", "Line 3"), "example.txt")
# text_content <- readLines("example.txt")
# print(text_content)

11. Packages and Libraries

R’s power comes from its extensive package ecosystem.

# Installing packages
# install.packages("dplyr")  # For data manipulation
# install.packages("ggplot2") # For visualization

# Loading packages
library(dplyr)
library(ggplot2)

# Using dplyr for data manipulation
# students %>%
#   filter(math_score > 85) %>%
#   select(name, math_score) %>%
#   arrange(desc(math_score))

12. Error Handling

R provides mechanisms for error handling.

# Basic error handling
tryCatch({
  result <- 10 / 0
  print(result)
}, error = function(e) {
  print(paste("Error occurred:", e$message))
}, finally = {
  print("This always executes")
})

# Using try()
result <- try(10 / 0, silent = TRUE)
if (inherits(result, "try-error")) {
  print("Division failed")
}

13. Advanced Topics

13.1 Apply Family Functions

The apply family functions are powerful for vectorized operations.

# Create a matrix
mat <- matrix(1:12, nrow = 3, ncol = 4)
print(mat)

# Apply functions
print(apply(mat, 1, mean))  # Row means
print(apply(mat, 2, sum))   # Column sums

# lapply for lists
my_list <- list(a = 1:5, b = 6:10, c = 11:15)
print(lapply(my_list, mean))  # Mean of each list element

# sapply (simplified version)
print(sapply(my_list, mean))

13.2 String Manipulation

text <- "Hello R Programming"

print(toupper(text))    # Convert to uppercase
print(tolower(text))    # Convert to lowercase
print(nchar(text))      # Count characters
print(substr(text, 1, 5))  # Extract substring

# Using stringr package (if installed)
# library(stringr)
# str_split(text, " ")  # Split string

13.3 Date and Time

# Current date and time
current_time <- Sys.time()
print(current_time)

# Formatting dates
formatted_date <- format(current_time, "%Y-%m-%d")
print(formatted_date)

# Date arithmetic
today <- Sys.Date()
future_date <- today + 30
print(future_date)

14. Best Practices

Here are some tips for writing better R code:

  • Use meaningful variable names
  • Comment your code appropriately
  • Use vectorized operations instead of loops when possible
  • Load packages at the beginning of your script
  • Use consistent indentation (2 spaces recommended)
  • Test your code with sample data
  • Use version control for your projects

15. Next Steps

To continue learning R:

  1. Practice: Work on small data analysis projects
  2. Explore Packages: Learn dplyr, ggplot2, tidyr, and other essential packages
  3. Online Resources: Use R documentation, Stack Overflow, and R-bloggers
  4. Books: “R for Data Science” by Hadley Wickham
  5. Courses: Take online courses on data analysis with R

R is a powerful tool for statistical computing and data analysis. With practice, you’ll be able to perform complex data manipulations, create beautiful visualizations, and conduct sophisticated statistical analyses.