Street-Smart Stats cover
PURCHASE A DIGITAL COPY
PURCHASE A HARD COPY
Lesson 1 Introduction to Statistical Research Methods
Lesson 2 Visualizing Data
Lesson 3 Central Tendency
Lesson 4 Variability
Lesson 5 Standardizing
Lesson 6 Normal Distribution
Lesson 7 Sampling Distributions
Lesson 8 Estimation
Lesson 9 Hypothesis Testing
Lesson 10 t-Tests for Dependent Samples
Lesson 11 t-Tests for Independent Samples
Lesson 12 Intro to One-Way ANOVA
Lesson 13 One-Way ANOVA: Test significance of differences
Lesson 14 Correlation
Lesson 15 Linear Regression
Lesson 16 Chi-Squared Tests
Afterward
Index

What is statistics? Only one of the most awesome types of math EVER! Statistics helps us make sense of the world around us by providing methods to describe and analyze data. Data comes in many forms:

Type of data Description Examples
Discrete Data for which there are only whole number values Number of people, Number of countries you’ve visited, price in cents
Continuous Data for which any value in-between whole numbers is possible Height, distance, length of time
Nominal Data listing names of something Type of relative (mom, aunt, uncle, cousin, grandfather), type of car
Ordinal Data that describes the rank/order Year in school (1st grade, 2nd grade), choice on rating scale (e.g., 1-5), months
Interval Numerical, ordinal data with evenly-spaced intervals Degrees in Fahrenheit
Ratio Interval data where 0 has a clear meaning Time

This book focuses on describing and analyzing continuous data, but in Lesson 16 we get into categorical data.

The examples of each type of data above are variables. The actual data values collected for each variable vary; for example, if we recorded the temperature every evening at 8:00 pm (where the variable is “degrees in Fahrenheit”), we would record many different values.

Always analyze data with a critical eye. You should know how a survey or experiment was conducted, who is in the sample, and how the variables are measured. This textbook will help you develop a sense for numbers so that you can tell if something fishy is going on.

For one thing, it’s crucial that you always know exactly how variables are measured. If the variable is height, are you measuring in inches? Centimeters? It’s relatively easy to determine a unit of measurement for height, but what about the variable happiness? Happiness, love, ambition, etc. are examples of constructs, variables that can’t be easily defined or measured.

Everyone will have a different definition and way of measuring each construct. When doing statistical research, we’ll need to specify these.
Quiz: Which are constructs?

For constructs, we have to determine an operational definition; in other words, a precise way of measuring them. For example, maybe we could measure happiness by the number of times people smile or laugh each day. Throughout this book, you’ll use operational definitions to analyze variables and the relationships between them. You’ll also learn how to draw conclusions about an entire population (e.g., all residents of the United States) based on actual population data or a sample of that population (e.g., 5000 randomly selected residents of the United States). Samples must be representative of a population in order to draw inferences about the population; for this to happen, the sample should be chosen randomly (see video), meaning that each member of the population has an equal chance of being chosen to be in the sample.

So let’s dive in! How do we describe a group of numbers? With numbers! Usually we can choose one or several numbers to describe a large group of numbers (we’ll go in detail on this in Lessons 3 and 4). Parameters are numbers that describe a population while sample statistics are numbers that describe samples. Coming up, you’ll use R to generate random samples from a population. In Lesson 3 you’ll see that sample statistics from random samples better approximate population parameters.

We’ll have a lot of R tutorials throughout this book. R is a free, open-source statistical program that is great for doing statistical analysis. You can download R at www.r-project.org. In our first tutorial, you’ll use R to choose a random sample of size n (where n could be 1, 2, 3, …) from a population of size k.

R Tutorial: Choosing a random sample of size n from a population of size k

Input your data into R. You have two options:

  1. Input it manually by typing each datum into a “cell”, denoted by “c”, separated by commas:

    population = c(x1, x2, x3, x4, …, xk)
    Let’s say our population consists of the numbers 64, 72, 60, 69, 68, 71, 64, and 73. Then, instead of x1, x2, etc., we would type in these numbers. We also don’t have to call our population “population”; in this case, we’ll call it “height”.

    Once you have input your data, the command sample(var_name, n) will generate a sample of size n. (Except, replace “var_name” with the name you used for the variable you’re using, and “n” with the size of the sample you want to draw.)

    Screen Shot 2014-08-24 at 9.41.11 PM

    In this randomly generated sample, R chose the following four from the population (in orange):

    64    72    60    69    68    71    64    73

  2. Input your data from a csv (comma-separated values) file. R saves and retrieves files from a specific working directory. Type getwd() to determine the working directory, and then make sure your csv is saved there.

    Your file should have header names in the top row of each column, which R uses as the names of each variable. (Without headers, R will assign its own name to each variable.) In R, type the command:

    population = read.csv(file = “[name of file].csv”, head = TRUE, sep = “,”)

    This enables R to read the csv file, and to associate each column head as the name of each variable.

    attach(population)

    This command allows R to understand the names of each variable if you use them in a command. Now that you’ve input the data, you can generate a random sample the same way you did earlier.

    Screen Shot 2014-08-24 at 9.41.38 PM

    This time, R randomly chose these four:

    64    72    60    69    68    71    64    73

2 thoughts on “Lesson 1: Introduction to Statistical Research Methods

    1. Hi Rick! Just download the Google spreadsheet as a csv and then save it to your working directory. When you input it into R (method 2 above), just make sure you use the same file name. Lesson 2 page 1 has an R tutorial that uses data from a Google spreadsheet and walks you through inputting it into R.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s