What are frequencies?
Frequencies are counts of things in different categories. For example, 417 people were asked to choose their favorite pet from a list of animals. Here's the data:
There are five categories. The numbers are the counts of people who chose the animal. Dogs are top, naturally.
The same data is often presented in a bar chart, like this:
Sometimes the categories are ranges of numbers, like this:
It's still categories and counts. The pattern for the code is the same.
In this lesson, you're going to learn how to take raw data like this:
dog dog cat llama bird cat dog
… and compute the frequencies.
A frequency array
We'll keep the frequency data in an array. Declare it like this:
Dim petFreqs(5) as Integer
We know there are five categories, so we can declare the array to be exactly that length.
Here's what the pet data will look like:
petFreqs(1) is the frequency for birds.
petFreqs(2) is the frequency for cats. And so on.
A frequency program
Let's write a program to read a file of favorite pet data, and compute frequencies. You can download the worksheet and some data files at the bottom of this page.
The worksheet looks like this to start:
Click Run, and the program will read in the raw data…
dog dog cat llama bird cat dog
Here's some output:
The overall structure of the program will be:
Extract Process Output
rawData is for the raw data read from the file.
petFreqs is for the frequencies, that is, the counts for each pet type.
petFreqs(1) will be the number of birds,
petFreqs(2) the number of cats, and so on.
There are calls to three subs. This one…
readPetData rawData, numDataElements
…reads the raw data from the data file, and returns it in
numDataElements has the number of data elements read. Which data file? The one the user names:
The second sub call:
computeFreqs rawData, numDataElements, petFreqs
It takes the
rawData and runs through it, counting the number of each pet type. It returns the result in
The last sub call:
outputFreqs petFreqs, numDataElements
It shows the results to the user.
Notice how simple and chunky the main program is:
Get pet data from a file
Here's the sub that gets the data from the file:
Get the file name from the worksheet. Open the file. While there is more data to read: Read one raw data element. Append it to the array.
Excel needs to know which folder the file is in. It looks in the same folder where the workbook is:
Open ThisWorkbook.Path & "\" & fileName For Input As #1
Remember to close the file. If you forget, and try to run the program again, you'll set a "File already open" error. Open the immediate window (CTRL+G), type…
… and press Enter.
Here's the sub to compute the frequencies:
petFreqs array to store the counts. The counts have to start at zero, so we need to make sure all of
petFreqs the elements start at zero. So:
rawDataIndex to 1, then runs the body of the loop. It makes
rawDataIndex 2, then runs the body of the loop. It makes
rawDataIndex 3, then runs the body of the loop. Etc. It ends the loop at
numDataElements. That was computed earlier, by
On line 53…
pet = rawData(rawDataIndex)
… the CPU grabs an element from
rawData, whichever one
rawDataIndex says. It puts the value into the variable
On line 54…
If pet = "bird" Then
… the CPU tests whether
pet is "bird". If it is, add one to the count of birds. The count of birds is in
petFreqs(1), so this will do the job:
countIndex = 1 ... petFreqs(countIndex) = petFreqs(countIndex) + 1
If the pet was a dog, we need to update a different array element:
countIndex = 3 ... petFreqs(countIndex) = petFreqs(countIndex) + 1
If statements run through the pet types, and set
countIndex to one thing or another, depending on the pet. Then
countIndex is used to update the right array element.
petFreqs(1), and not
petFreqs(4), or whatevs?
Cells(7, 1) = "Bird" Cells(7, 2) = petFreqs(1)
The code assumes that
petFreqs(1) is there the bird data is.
Want to put the bird count in
petFreqs(4)? No big:
For index = 1 To numDataElements pet = rawData(index) If pet = "bird" Then petFreqs(4) = petFreqs(4) + 1 ElseIf pet = "cat" Then ... Cells(7, 1) = "Bird" Cells(7, 2) = petFreqs(4)
We just need to be consistent.
ElseIf pet = "llama" Then petFreqs(5) = petFreqs(5) + 1 End If Next index
There's a bunch of
Ifs that just ends.
In the programs we've written earlier, the input sub checked for valid data. That's what should happen here, too.
readPetData should check for unknown pets, and not add that data to
Two index variables
There are two arrays. One for the raw data:
rawData. One for the counts:
petFreqs. Arrays are used with indexes, and the same index values mean different things for the arrays.
rawData(3) is what one person said their favorite pet is. Cat, dog, bird, whatevs.
petFreqs(3) has the index 3 as well, but it's the count for a particular pet (it's dog for our program).
Let's think about the code. We could use one index variable, like this:
Dim index as Integer ... For index = 1 To numDataElements pet = rawData(index) If pet = "bird" Then index = 1 ... petFreqs(index) = petFreqs(index) + 1
index is being used for two things. An index into
pet = rawData(index)
.. and an index into
petFreqs(index) = petFreqs(index) + 1
Code like this will make you scream. It might work, or might not. Even if it does, it's easy for bugs to creep in.
Better to use two different index variables, one for each array:
Dim rawDataIndex As Integer 'Used with raw data array. Dim countIndex As Integer 'Used with freq array.
The code looks like this:
For rawDataIndex = 1 To numDataElements pet = rawData(rawDataIndex) If pet = "bird" Then countIndex = 1 ... petFreqs(countIndex) = petFreqs(countIndex) + 1
Ah! Less confusion. Less screaming. Your dog will thank you.
Here's part of the main program:
computeFreqs sends out
petFreqs, filled with juicy data.
outputFreqs takes that data, and shows it to the user.
Computing frequency categories
In the pets programs, we used a bunch of
If statements to compute the index, that is, to compute which category of pet a person had chosen:
If pet = "bird" Then countIndex = 1 ElseIf pet = "cat" Then countIndex = 2 ElseIf pet = "dog" Then countIndex = 3 ElseIf pet = "fish" Then countIndex = 4 ElseIf pet = "llama" Then countIndex = 5 End If petFreqs(countIndex) = petFreqs(countIndex) + 1
Sometimes it's easier to use arithmetic to compute indexes. Suppose there's a credit score that ranges from 100 to 500. The file
scores.txt has scores for a bunch of humans:
322 377 317 286 336 225 125 200 265
Let's write a program that computes frequencies in groups of 100, like this:
You can download the worksheet and some data files at the bottom of this page.
We'll have an array
scoreFreqs, with four elements.
scoreFreqs(1) will count the number of scores from 100 to 199.
scoreFreqs(2) will count the number of scores from 200 to 299. And so on.
It's almost the same as the pets program. Most of the differences are in the
If statements from pets.
Let's see how it works. Suppose
score is 188. That should be in the first category, 100 to 199.
score / 100 is 1.88. The
Int() function makes that an integer, chopping off the 0.88. So
Int(188 / 100) is 1. That's the right category.
score is 372. That should be in the third category, 300 to 399.
score / 100 is 3.72. The
Int() function makes that an integer, chopping off the 0.72. So
Int(372 / 100) is 3. That's the right category.
There's just one fly in the ointment. The highest score is 500. That supposed to be in category 4: 400 to 500. But
Int(500 / 100) is 5. The formula doesn't work.
500 is a special case, also called an edge case. The easiest way to deal with it is to add an
Here's the output code:
Cells(countIndex + 6, 1) = label
The label will be something like "From 200 to 299:".
Lines 72 to 77 compute the label. There's an
If to handle the edge case.
5934,124,Free 5935,20,Blue jay 6851,51,Stork 5242,192,Stork 6232,133,Free
Each record is data about one customer. The first column is a customer id. The second column is that customer's purchases over the last three months. The third column is the customer's favorite bird. The only possible choices are:
Cardinal Blue jay Stork Larry Free
Write a program that computes the frequencies of each favorite bird, but only for customers who purchased $100 or more. The data file name should be in the spreadsheet. The output is to the spreadsheet. Like this:
The program should still work if the number of records in the file changes.
Upload your work. The usual coding standards apply.
(If you were logged in as a student, you could submit an exercise solution, and get some feedback.)