Often you want to process some of the data in a data set. Two main reasons.
Dirty data
Data from the RL is often dirty. For example, suppose you had this Web form:
A bunch of people filled it in. Their responses were copied to an Excel worksheet.
Here are some things people might type:
1
None
two
six lovely kitties
2 real, 2 stuffed
Ack! You need to run through the data, and pick out the values that make sense, before you can do any analysis.
Subsets
Sometimes you want to analyze a subset of the data. For example, suppose you have this sales data:
Maybe you want to process only data from the North region.
Filtering
What you want to do is data filtering.
Marking bad data
Let's start by going through a data set, and marking bad data.
Here's that sales data again:
We want a program that goes through the data, and checks that each entry is OK. Rules:
Region should be North, West, or Central. No other values are valid, although any mix of upper- and lowercase is OK.
Sales should be numeric, and zero or greater.
If any of the rules are violated, mark that cell, giving it a red background.
There's add one other rule-like-thing. Almost all sales are less than 500. If the program detects a sales value 500 or more, it will set the background color of the cell to yellow. Not definitely an error, but a suspicious value that should be checked.
Warning if sales 500 or more.
Here's all the data, and a button:
There might not be 20 rows, though. There might be 28, 19, whatevs.
Here's what the program should do:
Lily writes code
Tara
Lily, could you write the program, please?
Lily
OK. Let's see. Read through the data, check each cell, and apply those rules.
I don't know how many rows there are, so I'll use the pattern we just learned about to run through the data.
Patterns are Good Things. They're solutions you can reuse. That's what expert programmers do.
Lily
I'll start by copying that code, and see what needs to be changed.
Dim rows as Integer
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
Loop
Lily
Now what? Hmm… OK, I know. This is like the input validation we did before, except that instead of showing as error message, the cell's color is changed.
Let me check that pattern. I'll look in the pattern catalog:
Ah! Here it is:
OK, let's see… Here's some code I can use:
It won't fit exactly. It never does. But I can adapt it.
Lily is mixing patterns. She's putting the validation pattern inside the loop of the find-the-end-of-the-data pattern.
Dim rows as Integer
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
userInput = InputBox( "What is the amount?" )
If Not IsNumeric(userInput) Then
MsgBox "Sorry, amount must be a number."
End
ElseIf userInput <= 0 Then
MsgBox "Sorry, amount must be greater than zero."
End
EndIf
Loop
Lily
There's no InputBox, though (line 4). The data isn't being typed in. It's coming from a cell. Let's see… The program has just tested Cells(rows + 1, 1) (line 3), so if the program gets to line 4, then that cell isn't blank. I guess we can test that cell.
Dim rows as Integer
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
userInput = Cells(rows, 1)
If Not IsNumeric(userInput) Then
Lily
The If (line 6) won't change. I'll just leave it alone.
But the way the program shows errors is different.
Dim rows as Integer
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
userInput = Cells(rows, 1)
If Not IsNumeric(userInput) Then
MsgBox "Sorry, amount must be a number."
End
Lily
I don't want to show a error message. I want to change the background color of the cell. How do you do that? I don't remember that anywhere.
I'll check out the BFF page … Search on the page for "background".
Lily
Oh, that's right, I remember now. Interior.Color.
Dim rows as Integer
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
userInput = Cells(rows, 1)
If Not IsNumeric(userInput) Then
Cells(rows, 1).Interior.Color = vbRed
End
End If
Lily
I don't want that End (line 8), because I want the program to keep going when it finds a mistake. So it can find other mistakes.
I'll add the test for negative sales as well.
Dim rows as Integer
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
userInput = Cells(rows, 1)
If Not IsNumeric(userInput) Then
Cells(rows, 1).Interior.Color = vbRed
ElseIf userInput < 0 then
Cells(rows, 1).Interior.Color = vbRed
End If
Loop
Lily
Let me run it.
Some good and bad here. Good thing: Lily decided to test what she has so far. Write a bit, test a bit, write a bit, test a bit…
Bad thing: before testing, Lily should have quickly read through the code. She might have picked up some mistakes. Quicker than having to go through testing cycles.
Lily
Oh, yeah.
Lily
Oops! The numeric checks are supposed to be on column 2.
Option Explicit
Private Sub cmdCheck_Click()
Dim rows As Integer
Dim userInput As String
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
userInput = Cells(rows, 2)
If Not IsNumeric(userInput) Then
Cells(rows, 2).Interior.Color = vbRed
ElseIf userInput < 0 Then
Cells(rows, 2).Interior.Color = vbRed
End If
Loop
End Sub
Lily
Hey! Not bad.
Except B1. I need to start the loop at the second row. Let's see…
Oo! I can initialize rows to one, then the code will check row two.
rows = 1
Lily
Now, add the check for the warning for sales that could be too high.
Option Explicit
Private Sub cmdCheck_Click()
Dim rows As Integer
Dim userInput As String
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
userInput = Cells(rows, 2)
If Not IsNumeric(userInput) Then
Cells(rows, 2).Interior.Color = vbRed
ElseIf userInput < 0 Then
Cells(rows, 2).Interior.Color = vbRed
ElseIf userInput >= 500 Then
Cells(rows, 2).Interior.Color = vbYellow
End If
Loop
End Sub
Lily
OK! Now add the test for the region. Wait… OK, uppercase/lowercase doesn't matter. I'll add that trim thing, too.
Option Explicit
Private Sub cmdCheck_Click()
Dim rows As Integer
Dim userInput As String
rows = 0
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
userInput = Cells(rows, 2)
If Not IsNumeric(userInput) Then
Cells(rows, 2).Interior.Color = vbRed
ElseIf userInput < 0 Then
Cells(rows, 2).Interior.Color = vbRed
ElseIf userInput >= 500 Then
Cells(rows, 2).Interior.Color = vbYellow
End If
userInput = Cells(rows, 1)
userInput = LCase(Trim(userInput))
If userInput <> "north" And userInput <> "west" _
And userInput <> "central" Then
Cells(rows, 1).Interior.Color = vbRed
End If
Loop
End Sub
Tara
Yay! Good job!
Could you add some comments?
Option Explicit
Private Sub cmdCheck_Click()
Dim rows As Integer
Dim userInput As String
'Initialize.
rows = 1
Do While Cells(rows + 1, 1) <> ""
rows = rows + 1
'Check sales.
userInput = Cells(rows, 2)
If Not IsNumeric(userInput) Then
Cells(rows, 2).Interior.Color = vbRed
ElseIf userInput < 0 Then
Cells(rows, 2).Interior.Color = vbRed
ElseIf userInput >= 500 Then
'Show warning.
Cells(rows, 2).Interior.Color = vbYellow
End If
'Check the region.
userInput = Cells(rows1, 1)
userInput = LCase(Trim(userInput))
If userInput <> "north" And userInput <> "west" _
And userInput <> "central" Then
Cells(rows, 1).Interior.Color = vbRed
End If
Loop
End Sub
Keeping track
Let's add to Lily's code, so that it keeps track of the number of errors. This is what we want:
You program can't be sure how many day's data are in the worksheet.
Add code that will mark large price changes. When the daily price rises by 2 or more, mark the day green. When the price falls by 2 or more, mark it red. Like this:
The first row does not contain data.
Upload your solution. The usual coding standards apply.
When you submit, please answer this question: What did you learn from this exercise? Type in the text area on the solution form.
(If you were logged in as a student, you could submit an exercise solution, and get some feedback.)
Exercise: Find the quackers
You want to hire some ducks for your new movie, "The Night of the Zombie Ducks." You go to your local duck pond, and ask the ducks to text you the number of times they quack each day. Through the Magic of Technology, the data is collected in an Excel workbook. Each text has the sender's DuId (duck id) added to it automatically.
Here's a sample:
Duck beaks aren't the best for texting, so there could be bad data. Also, some ducks didn't follow the instructions, as in row 7.
Download the workbook, so you can add your own code. Write a program that does two things:
Draws lines through rows that have bad data. Quacks per day could be nonnumeric, or less than zero.
Mark the ducks that quack 70 or more times per day.
Here's some output:
Look at row 7. Both cells have strikethroughs.
Hints:
The first row doesn't have data in it.
Useful code you can adapt: Cells(7, 7).Font.Strikethrough = True
Upload your workbook. The usual coding standards apply, as always.
In the text box on the submission form, please answer this question: What did you learn from this exercise?
(If you were logged in as a student, you could submit an exercise solution, and get some feedback.)