Keywords:

# Just some of the data

Often you want to process some of the data in a data set. Two main reasons.

## Dirty data

Data from the RL is often dirty. For example, suppose you had this Web form: A bunch of people filled it in. Their responses were copied to an Excel worksheet.

Here are some things people might type:

• 1
• None
• two
• six lovely kitties
• 2 real, 2 stuffed

Ack! You need to run through the data, and pick out the values that make sense, before you can do any analysis.

## Subsets

Sometimes you want to analyze a subset of the data. For example, suppose you have this sales data: Maybe you want to process only data from the North region.

## Filtering

What you want to do is data filtering.

Let's start by going through a data set, and marking bad data.

Here's that sales data again: We want a program that goes through the data, and checks that each entry is OK. Rules:

• Region should be North, West, or Central. No other values are valid, although any mix of upper- and lowercase is OK.
• Sales should be numeric, and zero or greater.

If any of the rules are violated, mark that cell, giving it a red background.

There's add one other rule-like-thing. Almost all sales are less than 500. If the program detects a sales value 500 or more, it will set the background color of the cell to yellow. Not definitely an error, but a suspicious value that should be checked.

• Warning if sales 500 or more.

Here's all the data, and a button: There might not be 20 rows, though. There might be 28, 19, whatevs.

Here's what the program should do: ## Lily writes code Tara
Lily, could you write the program, please? Lily
OK. Let's see. Read through the data, check each cell, and apply those rules.

I don't know how many rows there are, so I'll use the pattern we just learned about to run through the data.

Pattern

Find the end of a data set

Situation:
The number of items in a data set varies. Maybe 502 one day, and 534 the next. You want to find the number of items.
Actions:
Use a Do loop and an index variable. Check cells one by one, until there's an empty cell.

Patterns are Good Things. They're solutions you can reuse. That's what expert programmers do. Lily
I'll start by copying that code, and see what needs to be changed.

1. Dim rows as Integer
2. rows = 0
3. Do While Cells(rows + 1, 1) <> ""
4.     rows = rows + 1
5. Loop Lily
Now what? Hmm… OK, I know. This is like the input validation we did before, except that instead of showing as error message, the cell's color is changed.

Let me check that pattern. I'll look in the pattern catalog: Ah! Here it is: OK, let's see… Here's some code I can use: It won't fit exactly. It never does. But I can adapt it.

Lily is mixing patterns. She's putting the validation pattern inside the loop of the find-the-end-of-the-data pattern.

1. Dim rows as Integer
2. rows = 0
3. Do While Cells(rows + 1, 1) <> ""
4.     rows = rows + 1
5.     userInput = InputBox( "What is the amount?" )
6.     If Not IsNumeric(userInput) Then
7.         MsgBox "Sorry, amount must be a number."
8.         End
9.     ElseIf userInput <= 0 Then
10.         MsgBox "Sorry, amount must be greater than zero."
11.         End
12.     EndIf
13. Loop Lily
There's no InputBox, though (line 4). The data isn't being typed in. It's coming from a cell. Let's see… The program has just tested Cells(rows + 1, 1) (line 3), so if the program gets to line 4, then that cell isn't blank. I guess we can test that cell.

1. Dim rows as Integer
2. rows = 0
3. Do While Cells(rows + 1, 1) <> ""
4.     rows = rows + 1
5.     userInput = Cells(rows, 1)
6.     If Not IsNumeric(userInput) Then Lily
The If (line 6) won't change. I'll just leave it alone.

But the way the program shows errors is different.

1. Dim rows as Integer
2. rows = 0
3. Do While Cells(rows + 1, 1) <> ""
4.     rows = rows + 1
5.     userInput = Cells(rows, 1)
6.     If Not IsNumeric(userInput) Then
7.         MsgBox "Sorry, amount must be a number."
8.         End Lily
I don't want to show a error message. I want to change the background color of the cell. How do you do that? I don't remember that anywhere.

Hmm… I'll try searching. There's a tip on how to find stuff on this site.  Lily
I'll check out the BFF page … Search on the page for "background".  Lily
Oh, that's right, I remember now. Interior.Color.

1. Dim rows as Integer
2. rows = 0
3. Do While Cells(rows + 1, 1) <> ""
4.     rows = rows + 1
5.     userInput = Cells(rows, 1)
6.     If Not IsNumeric(userInput) Then
7.         Cells(rows, 1).Interior.Color = vbRed
8.         End
9.     End If Lily
I don't want that End (line 8), because I want the program to keep going when it finds a mistake. So it can find other mistakes.

I'll add the test for negative sales as well.

1. Dim rows as Integer
2. rows = 0
3. Do While Cells(rows + 1, 1) <> ""
4.     rows = rows + 1
5.     userInput = Cells(rows, 1)
6.     If Not IsNumeric(userInput) Then
7.         Cells(rows, 1).Interior.Color = vbRed
8.     ElseIf userInput < 0 then
9.         Cells(rows, 1).Interior.Color = vbRed
10.     End If
11. Loop Lily
Let me run it.

Some good and bad here. Good thing: Lily decided to test what she has so far. Write a bit, test a bit, write a bit, test a bit…

Bad thing: before testing, Lily should have quickly read through the code. She might have picked up some mistakes. Quicker than having to go through testing cycles.  Lily
Oh, yeah.  Lily
Oops! The numeric checks are supposed to be on column 2.

1. Option Explicit
2. Private Sub cmdCheck_Click()
3.     Dim rows As Integer
4.     Dim userInput As String
5.     rows = 0
6.     Do While Cells(rows + 1, 1) <> ""
7.         rows = rows + 1
8.         userInput = Cells(rows, 2)
9.         If Not IsNumeric(userInput) Then
10.             Cells(rows, 2).Interior.Color = vbRed
11.         ElseIf userInput < 0 Then
12.             Cells(rows, 2).Interior.Color = vbRed
13.         End If
14.     Loop
15. End Sub  Lily

Except B1. I need to start the loop at the second row. Let's see…

Oo! I can initialize rows to one, then the code will check row two.

1.     rows = 1 Lily
Now, add the check for the warning for sales that could be too high.

1. Option Explicit
2. Private Sub cmdCheck_Click()
3.     Dim rows As Integer
4.     Dim userInput As String
5.     rows = 0
6.     Do While Cells(rows + 1, 1) <> ""
7.         rows = rows + 1
8.         userInput = Cells(rows, 2)
9.         If Not IsNumeric(userInput) Then
10.             Cells(rows, 2).Interior.Color = vbRed
11.         ElseIf userInput < 0 Then
12.             Cells(rows, 2).Interior.Color = vbRed
13.         ElseIf userInput >= 500 Then
14.             Cells(rows, 2).Interior.Color = vbYellow
15.         End If
16.     Loop
17. End Sub  Lily
OK! Now add the test for the region. Wait… OK, uppercase/lowercase doesn't matter. I'll add that trim thing, too.

1. Option Explicit
2. Private Sub cmdCheck_Click()
3.     Dim rows As Integer
4.     Dim userInput As String
5.     rows = 0
6.     Do While Cells(rows + 1, 1) <> ""
7.         rows = rows + 1
8.         userInput = Cells(rows, 2)
9.         If Not IsNumeric(userInput) Then
10.             Cells(rows, 2).Interior.Color = vbRed
11.         ElseIf userInput < 0 Then
12.             Cells(rows, 2).Interior.Color = vbRed
13.         ElseIf userInput >= 500 Then
14.             Cells(rows, 2).Interior.Color = vbYellow
15.         End If
16.         userInput = Cells(rows, 1)
17.         userInput = LCase(Trim(userInput))
18.         If userInput <> "north" And userInput <> "west" _
19.                 And userInput <> "central" Then
20.             Cells(rows, 1).Interior.Color = vbRed
21.         End If
22.     Loop
23. End Sub  Tara
Yay! Good job!

1. Option Explicit
2. Private Sub cmdCheck_Click()
3.     Dim rows As Integer
4.     Dim userInput As String
5.     'Initialize.
6.     rows = 1
7.     Do While Cells(rows + 1, 1) <> ""
8.         rows = rows + 1
9.         'Check sales.
10.         userInput = Cells(rows, 2)
11.         If Not IsNumeric(userInput) Then
12.             Cells(rows, 2).Interior.Color = vbRed
13.         ElseIf userInput < 0 Then
14.             Cells(rows, 2).Interior.Color = vbRed
15.         ElseIf userInput >= 500 Then
16.             'Show warning.
17.             Cells(rows, 2).Interior.Color = vbYellow
18.         End If
19.         'Check the region.
20.         userInput = Cells(rows1, 1)
21.         userInput = LCase(Trim(userInput))
22.         If userInput <> "north" And userInput <> "west" _
23.                 And userInput <> "central" Then
24.             Cells(rows, 1).Interior.Color = vbRed
25.         End If
26.     Loop
27. End Sub

## Keeping track

Let's add to Lily's code, so that it keeps track of the number of errors. This is what we want: • Initialize them to zero.
• Add 1 to an error count when each error happens.
• Output the counts.

1. Option Explicit
2. Private Sub cmdCheck_Click()
3.     Dim rows As Integer
4.     Dim regionErrors As Integer
5.     Dim salesErrors As Integer
6.     Dim salesWarnings As Integer
7.     Dim userInput As String
8.     'Initialize.
9.     rows = 1
10.     regionErrors = 0
11.     salesErrors = 0
12.     salesWarnings = 0
13.     Do While Cells(rows + 1, 1) <> ""
14.         rows = rows + 1
15.         'Check sales.
16.         userInput = Cells(rows, 2)
17.         If Not IsNumeric(userInput) Then
18.             salesErrors = salesErrors + 1
19.             Cells(rows, 2).Interior.Color = vbRed
20.         ElseIf userInput < 0 Then
21.             salesErrors = salesErrors + 1
22.             Cells(rows, 2).Interior.Color = vbRed
23.         ElseIf userInput >= 500 Then
24.             'Show warning.
25.             salesWarnings = salesWarnings + 1
26.             Cells(rows, 2).Interior.Color = vbYellow
27.         End If
28.         'Check the region.
29.         userInput = Cells(rows, 1)
30.         userInput = LCase(Trim(userInput))
31.         If userInput <> "north" And userInput <> "west" _
32.                 And userInput <> "central" Then
33.             regionErrors = regionErrors + 1
34.             Cells(rows, 1).Interior.Color = vbRed
35.         End If
36.     Loop
37.     'Output error count
38.     Cells(4, 4) = regionErrors
39.     Cells(5, 4) = salesErrors
40.     Cells(6, 4) = salesWarnings
41. End Sub

# Exercises

Exercise: Show large price changes
Download this worksheet. It has prices at the start and end of each day: You program can't be sure how many day's data are in the worksheet.

Add code that will mark large price changes. When the daily price rises by 2 or more, mark the day green. When the price falls by 2 or more, mark it red. Like this: The first row does not contain data.

When you submit, please answer this question: What did you learn from this exercise? Type in the text area on the solution form.

Attachments

mark-price-changes-start.xlsm

(If you were logged in as a student, you could submit an exercise solution, and get some feedback.)

Exercise: Find the quackers You want to hire some ducks for your new movie, "The Night of the Zombie Ducks." You go to your local duck pond, and ask the ducks to text you the number of times they quack each day. Through the Magic of Technology, the data is collected in an Excel workbook. Each text has the sender's DuId (duck id) added to it automatically.

Here's a sample: Duck beaks aren't the best for texting, so there could be bad data. Also, some ducks didn't follow the instructions, as in row 7.

• Draws lines through rows that have bad data. Quacks per day could be nonnumeric, or less than zero.
• Mark the ducks that quack 70 or more times per day.

Here's some output: Look at row 7. Both cells have strikethroughs.

Hints:

• The first row doesn't have data in it.
• Useful code you can adapt: Cells(7, 7).Font.Strikethrough = True