forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathPA1_template.Rmd
More file actions
141 lines (120 loc) · 4.12 KB
/
PA1_template.Rmd
File metadata and controls
141 lines (120 loc) · 4.12 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
```{r reading-data}
activity <- read.csv("activity/activity.csv")
summary(activity)
```
## What is mean total number of steps taken per day?
```{r histogram-total-steps}
totalSteps <- with(activity, aggregate(steps, list(date = date), sum))
hist(totalSteps$x, xlab = "Total number of steps taken each day",
main = "Histogram of total number of steps")
```
The mean total number of steps is
```{r}
mean(totalSteps$x, na.rm = T)
```
and the median number of steps is
```{r}
median(totalSteps$x, na.rm = T)
```
## What is the average daily activity pattern?
```{r average-daily-activity}
avgStepsPerInterval <- with(activity,
aggregate(steps, list(interval = interval),
function(x) mean(x, na.rm = T)))
with(avgStepsPerInterval, plot(interval, x, type = "l", ylab = "", xlab = ""))
title(main = "Average number of steps taken, averaged across all days",
ylab = "Average number of steps taken",
xlab = "Interval")
```
The interval that contains maximum number of steps is
```{r}
maxIndex <- which.max(avgStepsPerInterval$x)
avgStepsPerInterval$interval[maxIndex]
```
## Imputing missing values
The number of rows containing missing values is
```{r}
sum(!complete.cases(activity))
```
The missing values are replaced by the mean number of steps for the interval
across all days (or 0).
```{r}
cleanedActivity <- activity
cleanedActivity$steps <- as.numeric(apply(activity, 1, function(row) {
x <- row['steps']
interval <- row['interval']
if(is.na(x)) {
index <- which(
avgStepsPerInterval$interval == as.numeric(interval))
if(length(index) > 0) {
x <- avgStepsPerInterval$x[index[1]]
} else {
x <- 0
}
}
x
}))
summary(cleanedActivity)
```
Let's save the cleaned data set.
```{r}
write.csv(cleanedActivity, file = "activity/cleaned_activity.csv",
row.names = F)
```
Cleaning the data changed a little the histogram of the total number of steps,
because now there are more *average* values.
```{r histogram-cleaned-total-steps}
totalStepsCleaned <- with(cleanedActivity,
aggregate(steps, list(date = date), sum))
hist(totalStepsCleaned$x, xlab = "Total number of steps taken each day",
main = "Histogram of total number of steps")
```
The mean total number of steps in the cleaned data set is
```{r}
mean(totalStepsCleaned$x)
```
and the median total number of steps in the cleaned data set is
```{r}
median(totalStepsCleaned$x)
```
## Are there differences in activity patterns between weekdays and weekends?
Determine which day is a weekday or a weekend day.
```{r}
cleanedActivity$day <- as.factor(apply(cleanedActivity, 1, function(row) {
date <- as.POSIXct(as.character(row['date']), format = "%Y-%m-%d")
weekdayNr <- as.integer(strftime(date, "%u"))
weekday <- "weekday"
if(weekdayNr == 6 | weekdayNr == 7) {
weekday <- "weekend"
}
weekday
}))
```
Compute mean of steps taken per each interval and each day type.
```{r}
avgStepsPerIntervalAndWeekday <-
with(cleanedActivity, aggregate(steps, list(interval = interval,
day = day),
function(x) mean(x, na.rm = T)))
cleanedActivity$stepsAvg <- apply(cleanedActivity, 1, function(row) {
interval <- as.integer(row['interval'])
day <- row['day']
index <- which(avgStepsPerIntervalAndWeekday$interval == interval &
avgStepsPerIntervalAndWeekday$day == day)
avgStepsPerIntervalAndWeekday$x[index]
})
```
```{r avg-steps-interval-weekday}
library(lattice)
plotData <- unique(cleanedActivity[,c(3,4,5)])
xyplot(stepsAvg ~ interval | day, data = plotData, type = "l",
layout = c(1, 2), xlab = "Interval", ylab = "Number of steps",
main = "Average number of steps taken")
```