Finding the end

In our examples so far, we've known how much data there is in the worksheet. But in RL, the amount of data changes from day to day. For example, you have 280 sales records on Monday, 318 on Tuesday, 278 on Wednesday, and so on. You have to write programs that can deal with that.

What ends the data?

Here's a typical column of data:

Data

And another:

Data

And another:

Data

After a while, the data just stops. There's an empty cell after the data. Hmm… how about we read the data until we get to an empty cell?

We can't use a For-Next for that. Remember the general form:

For [index variable] = [start] To [end]

We don't know what [end] is. But we can use a regular Do loop.

Counting rows

Let's have a variable rows, that we initialize to zero. Then ask: is the next cell (rows + 1 = 1) empty?

Step

No, so add 1 to rows (it's now 1) and keep going. This is look ahead. We decide what to do, based on what the next value is.

rows is 1. Is the next cell (rows + 1 = 2) empty?

Step

No, so add 1 to rows (it's now 2) and keep going.

rows is 2. Is the next cell (rows + 1 = 3) empty?

Step

No, so add 1 to rows (it's now 3) and keep going.

rows is 3. Is the next cell (rows + 1 = 4) empty?

Step

Yes, the cell is empty, so stop the loop. rows is 3. That's how much data we have.

Steps

Here's code for it.

  1. Private Sub cmdRun_Click()
  2.     Dim rows As Integer
  3.     rows = 0
  4.     Do While Cells(rows + 1, 1) <> ""
  5.         rows = rows + 1
  6.     Loop
  7.     Cells(4, 3) = rows
  8. End Sub
Line 4 shows the look ahead. Instead of Cells(rows, 1), it tests Cells(rows + 1, 1), looking ahead one cell.

Pattern

Find the end of a data set

Situation:
The number of items in a data set varies. Maybe 502 one day, and 534 the next. You want to find the number of items.
Actions:
Use a Do loop and an index variable. Check cells one by one, until there's an empty cell.

There's something else different about the Do. Previously, we've seen loops like this:

Do
    [code]
Loop Until [logical expression]
Do
    [code]
Loop While [logical expression]

These are post-tested loops, that is, the test is at the end of the loop. The [code] always executes at least once.

There are also pretested loops, where the test is at the start of the loop.

Do Until [logical expression]
    [code]
Loop
Do While [logical expression]
    [code]
Loop

It's possible that [code] won't execute at all. For example:

  1. x = 7
  2. Do While x < 6
  3.     x = x + 1
  4.     MsgBox "x is " & x
  5. Loop
The loop body will never execute.

That might happen in our find-the-end-of-the-data program. Here's the code again:

  1. Private Sub cmdRun_Click()
  2.     Dim rows As Integer
  3.     rows = 0
  4.     Do While Cells(rows + 1, 1) <> ""
  5.         rows = rows + 1
  6.     Loop
  7.     Cells(4, 3) = rows
  8. End Sub
Suppose there's no data at all, that is, Cells(1, 1) is empty. What's the code going to do? Line 3 initializes rows to zero. The While loop's test fails, because Cells(rows + 1, 1) is "". The loop doesn't run even once. When line 7 outputs rows, it outputs 0, the correct value.

Which loop form should you use? Pretested, post-tested, While, Until… Use whichever makes the most sense to you, for the task you're working on. If the program is easier to understand when you use a post-tested loop, use that one.

Processing while counting

You can do other computations while finding the last element. For example:

  1. Private Sub cmdRun_Click()
  2.     Dim rows As Integer
  3.     Dim total As Single
  4.     'Initialize
  5.     total = 0
  6.     rows = 0
  7.     'Process
  8.     Do While Cells(rows + 1, 1) <> ""
  9.         rows = rows + 1
  10.         total = total + Cells(rows, 1)
  11.     Loop
  12.     'Output
  13.     Cells(4, 3) = rows
  14.     Cells(5, 3) = total
  15. End Sub
This works for totaling, but highest and lowest are a bit messier. So is validation. So is filtering, that we'll get to later. In fact, the more processing you mix in with the counting code, the more confusing things get. More bugs. More screaming.

Argh!

Thunking

To keep from screaming, split processing up. Remember IPO? That's the input-processing-output pattern.

Pattern

Input-processing-output

Situation:
Data needs to be transformed into other data, without user interaction after input.

Actions:
Use abstraction to define each of the three parts, and the way they communicate. Abstraction means that you:
  • Name each subroutine.
  • Specify parameters and, for functions, return values.

Replace input with counting, and it works quite well.

Let's write a program that reads an unknown amount of data and computes the total, highest, and lowest values. Let's add some validation: it won't run if there's fewer than 2 pieces of data.

Here's the main program:

  1. Private Sub cmdRun_Click()
  2.     Dim rows As Integer
  3.     Dim countOK As Boolean
  4.     Dim total As Single
  5.     Dim highest As Single
  6.     Dim lowest As Single
  7.     'Count data cells.
  8.     countDataCells rows, countOK
  9.     If Not countOK Then
  10.         End
  11.     End If
  12.     'Processing.
  13.     computeStats rows, total, highest, lowest
  14.     'Output
  15.     outputStats rows, total, highest, lowest
  16. End Sub
All countDataCells does is, er, count the data cells. It returns the number of cells, and a flag showing whether everything went OK. If it didn't, the program stops.

Here's the code for the sub:

  1. 'Count the number of data cells.
  2. '  rows: number of cells.
  3. '  countOK: flag showing whether there's a problem with the data.
  4. Sub countDataCells(rows As Integer, countOK As Boolean)
  5.     'Initialize.
  6.     rows = 0
  7.     'Count.
  8.     Do While Cells(rows + 1, 1) <> ""
  9.         rows = rows + 1
  10.     Loop
  11.     'Validate.
  12.     countOK = True
  13.     If rows < 2 Then
  14.         MsgBox "Sorry, you must have at least two data items."
  15.         countOK = False
  16.     End If
  17. End Sub
Lines 8 to 10 are the look ahead loop we saw before. The countOK flag is set to True (line 12). It's switched to False only if there's a problem with too little data.

Once we have a count, what's next? Here's part of the main program again:

  1. 'Count data cells.
  2. countDataCells rows, countOK
  3. If Not countOK Then
  4.     End
  5. End If
  6. 'Processing.
  7. computeStats rows, total, highest, lowest
We send rows into computeStats, and it sends back total, highest, and lowest:

  1. 'Compute statistics.
  2. '  rows: number of data items.
  3. '  total: sum of all items.
  4. '  highest: highest value.
  5. '  lowest: lowest value.
  6. Sub computeStats(rows As Integer, total As Single, _
  7.         highest As Single, lowest As Single)
  8.     Dim row As Integer
  9.     'Initialize.
  10.     total = Cells(1, 1)
  11.     highest = Cells(1, 1)
  12.     lowest = Cells(1, 1)
  13.     For row = 1 To rows
  14.         total = total + Cells(row, 1)
  15.         If Cells(row, 1) > highest Then
  16.             highest = Cells(row, 1)
  17.         End If
  18.         If Cells(row, 1) < lowest Then
  19.             lowest = Cells(row, 1)
  20.         End If
  21.     Next row
  22. End Sub
Hey! Now that we know how many data elements there are, we can use a For-Next loop. Woohoo! Just copy the code from before.

The main program again:

  1. 'Count data cells.
  2. countDataCells rows, countOK
  3. If Not countOK Then
  4.     End
  5. End If
  6. 'Processing.
  7. computeStats rows, total, highest, lowest
  8. 'Output
  9. outputStats rows, total, highest, lowest
We send rows, total, highest, and lowest into outputStats:

  1. 'Output statistics.
  2. '  rows: number of data items.
  3. '  total: sum of all items.
  4. '  highest: highest value.
  5. '  lowest: lowest value.
  6. Sub outputStats(rows As Integer, total As Single, _
  7.         highest As Single, lowest As Single)
  8.     Cells(4, 3) = rows
  9.     Cells(5, 3) = total
  10.     Cells(6, 3) = highest
  11.     Cells(7, 3) = lowest
  12. End Sub
Thunking the program makes it easier to write, test, and read. For example, when we write countDataCells, we only have to think about counting cells. Toss the stats and output stuff out of our heads, for now. Simpler code. Fewer bugs. Less screaming.

We can also have different people write the subs, then put the whole thing together.

We can replace one of the subs, without touching the others. For example, suppose The Boss wants the output to show in message boxes. Easy peasy.

  1. 'Output statistics.
  2. '  rows: number of data items.
  3. '  total: sum of all items.
  4. '  highest: highest value.
  5. '  lowest: lowest value.
  6. Sub outputStats(rows As Integer, total As Single, _
  7.         highest As Single, lowest As Single)
  8.     MsgBox "Elements: " & rows
  9.     MsgBox "Total: " &  total
  10.     MsgBox "Highest: " & highest
  11.     MsgBox "Lowest: " & lowest
  12. End Sub
Nothing else changes. We don't have to think about, test, debug, or scream about the rest of the program.

Let's thunk

From now on, let's organize our code in chunks like this. Our programs are going to get more and more complex. Thunking will keep the complexity under control. Less screaming.

That's a Good Thing.

Exercises

Exercise: Count the data values
Download this workbook. It looks like this:

Worksheet

Add code that counts the number of data values in column 2, and shows the count in row 5, column 1. The usual coding standards apply.

Upload your workbook.

Attachments

count-data.xlsm

(If you were logged in as a student, you could submit an exercise solution, and get some feedback.)

Exercise: Comparing regional sales
Download a workbook with sales for two regions:

Data

The first column has sales for the north region. The second has sales for the west region. The number of sales in each column might be different.

Write a program that will total the first column and the second. If the first column is higher, output "North" in row 5, column 3. If the second is higher, output "West". If they are the same, output "Equal".

Some of the code has been given for you. Use the code, without change.

The usual coding standards apply.

Upload your workbook.

Attachments

compare-regions.xlsm

(If you were logged in as a student, you could submit an exercise solution, and get some feedback.)

Summary

So far, we've known how much data there is in the worksheet. But in RL, the amount of data changes from day to day. We can't use a For-Next loop, because we don't know where to end.

We can use a regular Do loop, with look ahead. We decide what to do, based on what the next value is.

The more processing you mix in with the counting code, the more confusing things get. A way around this is to split processing up. Thunking the program makes it easier to write, test, and read. We can also have different people write the subs, then put the whole thing together.