From array to data file

Let's look at a practical use of arrays. Lots of software can save data in Excel format, including this Web site. Instructors can download data on student work, in Excel format. Accounting software, inventory programs, HR software… many can save data in Excel format.

Here's Diana.

Diana

Hi! Nice to meet you.

Diana and you work for the same company. Diana is in HR, you are in IS. Part of Diana's job is to keep the employee database up-to-date. Add new employees, update benefits plans, like that. She also analyses that data for her boss.

HR is trying out WOMBAT, a new data analysis program that's still in beta (that is, still being tested). Diana wants to export data from the HR database into WOMBAT. The HR DB can save data in Excel, and WOMBAT is supposed to read Excel files. Here's Diana's plan:

Plan

Diana starts the HR DB program, goes to the export screen, and chooses the fields she wants:

Exporting

She choose EID (eployee id), last name, status (full-time or part-time), and health insurance plan (a code for the plan, e.g., "BC/P"). SHe clicks the Export to Excel button, and saves the workbook created by the HR DB. Diana opens it to make sure the data is there:

Workbook

Hooray! So far, so good.

Diana starts WOMBAT, and tries to import the workbook:

Error

Doh!

You – to the rescue!

Your boss asks you to help. You contact the WOMBAT people, and they tell you that, yes, it won't import Excel. But it will import CVS (comma-separated values) files. Some CSV data:

1000, 35
1001, 20
1002, 38
1003, 32
1004, 32
1005, 32
1006, 22

It's just values, separated by commas. A simple, widely used format.

"No problem," you say to Diana. "We'll just save the data in CSV. Excel can do that."

"Great!" she says. "Just one thing…"

(Uh oh.)

"Could you get rid of the records for part-timer employees with the NH1 plan? The HR DB can't filter them out of the export. And WOMBAT can't remove them, either."

(Sigh.)

"OK," you reply. "I'll write some VBA."

So you do.

Excel to array to CSV file

Tara
Tara
Lily and Jeremy. You want to take a crack at this?

Lily
Lily
OK. Take data from the worksheet, write it out, except without the part-time NH1ers.

How do you want to thunk about this?

Jeremy
Jeremy
Huh?

Lily
Lily
What are the big chunks?

I'm thinking maybe two chunks:

  • Read the data into an array, filtering out the records we don't want.
  • Write the array to a file.

What do you think?

Jeremy
Jeremy
OK, except for one thing. Here's the worksheet.

Worksheet

We've only read data for one column. There are four!

Lily
Lily
Oh, yeah. Uh, Tara, what do we do?
Tara
Tara
There are a few ways to do it. The easiest is to create one array for each column. When you add data to one array, add to them all.
Lily
Lily
Oh, OK! I think I get it.

Let's get started with the main program. I'll add a Run button… there. Now I'll just start with this:

Private Sub cmdRun_Click()
    'Read data from worksheet into arrays.

    'Write CSV data file from arrays.
End Sub

OK, that's a start.

W00t for Lily! She's starting with the big chunks (this is often called top-down programming). She's just putting in the comments, too. A Good Thing.

Jeremy
Jeremy
Oh, I remember how we do this now. I'll add the calls to the first subs. We'll need an array for each data column. EIDs, names, statuses, and plans.

Private Sub cmdRun_Click()
    'Read data from worksheet into arrays.
    readFilterData eids, lastNames, employmentStatuses, plans
    'Write CSV data file from arrays.
End Sub

How does that look?

Lily
Lily
Not bad, though remember we don't know how many rows there are. We'll need our code to find the end of the data, like we did before, and return the number of rows to the main program.

So, I guess that means four counts, one for each column?

Jeremy
Jeremy
Hmm, maybe not. Every column has the same number of rows. So we should get by with one count variable. Use it for all the columns.
Lily
Lily
Hey, that's right? So maybe this:

Private Sub cmdRun_Click()
    'Read data from worksheet into arrays.
    readFilterData eids, lastNames, employmentStatuses, plans, >>numRows<<
    'Write CSV data file from arrays.
End Sub

That look's good. But let's declare the variables, and add the second call to the main program.

(Typity, typity)

Private Sub cmdRun_Click()
    Dim eids(500) As String
    Dim lastNames(500) As String
    Dim employmentStatuses(500) As String
    Dim plans(500) As String
    'Read data from worksheet into arrays.
    readFilterData eids, lastNames, employmentStatuses, plans, numRows
    'Write CSV data file from arrays.
    writeCsvFile eids, lastNames, employmentStatuses, plans, numRows
End Sub

I'm guessing at the 500. We can always change it, if it isn't enough.

How's that look?

Jeremy
Jeremy
Great! But why did you make eids an array of strings? They're numbers.
Lily
Lily
Yeah, but we're not going to do any calculations on them. We could make them numbers instead, if we make them Long rather than Integer. They need to be Long, because they're more than 32,767.
Jeremy
Jeremy
I get it now. Let's leave it as it is.

How about adding stubs and comments for the Subs? Help keep us on track.

Private Sub cmdRun_Click()
    Dim eids(500) As String
    Dim lastNames(500) As String
    Dim employmentStatuses(500) As String
    Dim plans(500) As String
    'Read data from worksheet into arrays.
    readFilterData eids, lastNames, employmentStatuses, plans, numRows
    'Write CSV data file from arrays.
    writeCsvFile eids, lastNames, employmentStatuses, plans, numRows
End Sub

'Read data from the worksheet into arrays, filtering as we go.
'Params:
'  eids: array of employee ids (output).
'  lastNames: array of last names (output).
'  employmentStatuses: array of full-time/part-time statuses (output).
'  plans: array of health plan codes (output).
'  numRows: number of rows read (output).
Sub readFilterData(eids, lastNames, employmentStatuses, plans, numRows As Integer)

End Sub

'Write data to CSV file.
'Params:
'  eids: array of employee ids (input).
'  lastNames: array of last names (input).
'  employmentStatuses: array of full-time/part-time statuses (input).
'  plans: array of health plan codes (input).
'  numRows: number of rows read (input).
Sub writeCsvFile(eids, lastNames, employmentStatuses, plans, numRows As Integer)

End Sub

Well, how's that?

Now they have the complete top level of the program. Huzzah!

Jeremy
Jeremy
I'm not sure how to write the arrays as params in a Sub's signature. I put:

Sub readFilterData(eids, lastNames, ... numRows As Integer)

Tara, how do you do it?

Tara
Tara
You can leave it like that, without a type, and VBA will figure it out. Or you can declare the type of each array like this:

Sub readFilterData(eids() As String, lastNames() As String, ... numRows As Integer)

The empty parens tell VBA that you're passing arrays.

Either way is fine.

The input sub

Jeremy
Jeremy
Let's try the input sub. Here's the code so far:

'Read data from the worksheet into arrays, filtering as we go.
'Params:
'  eids: array of employee ids (output).
'  lastNames: array of last names (output).
'  employmentStatuses: array of full-time/part-time statuses (output).
'  plans: array of health plan codes (output).
'  numRows: number of rows read (output).
Sub readFilterData(eids() As String, lastNames() As String, _
        employmentStatuses() As String, plans() As String, numRows As Integer)

End Sub

Now what? Hmm… you know, we just saw how to read data from a worksheet into an array, in the last lesson. I'm going to grab some code from there, and paste it here. Then adjust it.

'Read data from the worksheet into arrays, filtering as we go.
'Params:
'  eids: array of employee ids (output).
'  lastNames: array of last names (output).
'  employmentStatuses: array of full-time/part-time statuses (output).
'  plans: array of health plan codes (output).
'  numRows: number of rows read (output).
Sub readFilterData(eids() As String, lastNames() As String, _
        employmentStatuses() As String, plans() As String, numRows As Integer)
    Dim rowSource As Integer
    Dim rowDestination As Integer
    rowSource = 0
    rowDestination = 0
    Do While Cells(rowSource + 1, 1) <> ""
        'Is the data above 250?
        If Cells(rowSource + 1, 1) > 250 Then
            'Copy value to destination.
            rowDestination = rowDestination + 1
            sales(rowDestination) = Cells(rowSource + 1, 1)
        End If
        rowSource = rowSource + 1
    Loop
End Sub

Lil, do you remember how this works?

Lily
Lily
Aye! rowSource is a counter, tracking which row in the data source (the worksheet) we're getting data from.

rowDestination is a counter as well, but it tracks where we put data in the destination (the array).

The While loops until it finds an empty cell. If the data in a cell passes the filter (the If statement), then it gets put into the array.

This should be easy to change. How about this:

'Read data from the worksheet into arrays, filtering as we go.
'Params:
'  eids: array of employee ids (output).
'  lastNames: array of last names (output).
'  employmentStatuses: array of full-time/part-time statuses (output).
'  plans: array of health plan codes (output).
'  numRows: number of rows read (output).
Sub readFilterData(eids() As String, lastNames() As String, _
        employmentStatuses() As String, plans() As String, numRows As Integer)
    Dim rowSource As Integer
    Dim rowDestination As Integer
    Dim eid As String
    Dim lastName As String
    Dim employmentStatus As String
    Dim plan As String
    'Init row counters.
    rowSource = 0
    rowDestination = 0
    Do While Cells(rowSource + 1, 1) <> ""
        rowSource = rowSource + 1
        'Get a row's data.
        eid = Cells(rowSource, 1)
        lastName = Cells(rowSource, 2)
        employmentStatus = Cells(rowSource, 3)
        plan = Cells(rowSource, 4)
        'Save this one?
        If Not (employmentStatus = "p" And plan = "NH1") Then
            'Copy value to destination.
            rowDestination = rowDestination + 1
            eids(rowDestination) = eid
            lastNames(rowDestination) = lastName
            employmentStatuses(rowDestination) = employmentStatus
            plans(rowDestination) = plan
        End If
    Loop
    numRows = rowDestination
End Sub

I added variables like eid and lastName to put the data in each row into. Made it easier to think about what the code does. Changed the filter test. Used all four arrays. Not much else changed.

This is what programmers do. They take something that works, and adapt it.

The output sub

Jeremy
Jeremy
Let me try the output sub. We know how many data rows there are in the arrays, so we can use a For loop. Something like this:

'Write data to CSV file.
'Params:
'  eids: array of employee ids (input).
'  lastNames: array of last names (input).
'  employmentStatuses: array of full-time/part-time statuses (input).
'  plans: array of health plan codes (input).
'  numRows: number of rows read (input).
Sub writeCsvFile(eids() As String, lastNames() As String, _
        employmentStatuses() As String, plans() As String, numRows As Integer)
    Dim index As Integer
    Open ThisWorkbook.Path & "\hr-db.csv" For Output As #1
    For index = 1 To numRows
        >>Write out eids(index), lastNames(index), employmentStatuses(index), plans(index)<<<
    Next index
    Close #1
End Sub

But how do you output data in the CSV format, with all those commas? Seems like there should be an easier way.

Tara, is there?

Tara
Tara
Indeed! Checkout the Write statement. That's what it does.

Lily
Lily
(Google, Google. Read, read.)

Oh, man! That makes it so easy!

'Write data to CSV file.
'Params:
'  eids: array of employee ids (input).
'  lastNames: array of last names (input).
'  employmentStatuses: array of full-time/part-time statuses (input).
'  plans: array of health plan codes (input).
'  numRows: number of rows read (input).
Sub writeCsvFile(eids() As String, lastNames() As String, _
        employmentStatuses() As String, plans() As String, numRows As Integer)
    Dim index As Integer
    Open ThisWorkbook.Path & "\hr-db.csv" For Output As #1
    For index = 1 To numRows
        >>Write #1, eids(index), lastNames(index), employmentStatuses(index), plans(index)<<
    Next index
    Close #1
End Sub

Write rocks CSV files!