We saw in this earlier post how to flatten a table using purely SQL. In this post, I am going to tackle the same problem by writing a bit of VBA code to tackle the same problem. Once again, I will use the original table from that earlier post in this post also. Towards the end of the post I will talk about how the VBA version of the solution has some advantages over the SQL version of the solution.
If you are interested, you can find my earlier posts on finding the median, the mode, the geometric and harmonic means, ranking every row in a query, selecting random rows out of a table, calculating running sums and averages, calculating running differences, creating histograms, calculating probability masses out of given data, calculating cumulative distributions out of given data, finding percentile scores, percentile values, calculating distinct counts, full outer joins, parameter queries, crosstab queries, working with system objects, listing table fields,finding unmatched rows, calculating statistics with grouping, job-candidate matching, job-candidate matching with skill levels, great circle distances, great circle initial headings, using Excel functions in Access, using the windows file-picker, using the Access date-picker, setting tab indexes quickly and correctly, pre-filling forms based on previous entries, highlighting form controls, conditional formatting, performing date manipulations, counting letters, words, sentences and paragraphs, calculating age, propercasing text entries, and flattening a table (using SQL).
To refresh our memory, here is the “input” table for our task:
And, here is the output we want from it:
The VBA I am going to present in this post is in the form of a function call that will do the aggregation of author names for each book ID. I have tried to make the function call have arguments similar to a function that exists in the MySQL implementation of SQL. MySQL actually has an aggregate function for concatenating items in a field in a group by query. That function is called group_concat(), and you can read technical details about how to invoke it and what arguments its takes here. My VBA function is going to take similar arguments, but not the same ones, and I am not going to spend my time dealing with all the options that group_concat() is capable of dealing with. What this means is that if you need something as flexible as group_concat(), you have to enhance the VBA posted here yourself. It also means that if your needs for flattening a table are not covered by the VBA posted here, you have to modify the VBA to fit your needs.
The version of group_concat() I have created for Microsoft Access is posted below:
Function group_concat(strSQL As String, groupField As Integer, orderBy As String, separator as String) As String
Dim db As DAO.Database
Dim rst As DAO.Recordset
Dim strSQLFinal As String
Dim groupingValue As Integer
Dim groupedValues As String
group_concat = ""
strSQLFinal = strSQL & " where BookID = " & groupField & " " & orderBy
Set db = CurrentDb()
Set rst = db.OpenRecordset(strSQLFinal, dbOpenSnapshot)
If Not rst.BOF And Not rst.EOF Then
groupingValue = rst!BookID
groupedValues = rst!AuthorName
Do Until rst.EOF
groupedValues = groupedValues & separator & rst!AuthorName
Set rst = Nothing
Set db = Nothing
group_concat = groupedValues
The basic way this function works is quite simple. As the first argument, you pass it the SQL statement that selects the necessary columns from the table. You can pass it any SQL statement that will have both the field you want grouped and the field you want to group by in it. The advantage of doing it this way is so that you have the option to use joins and other complicated SQL syntax if that is what you need to get what you want into the function (for instance, if the field you want to order by is in another table, you can join the original table with the other table in the SQL so that the field you want to order by is available in the final query).
The next argument is the actual value of the field that you want to group. In this case, it is the bookID value for which you want to collect authors. Notice that in this implementation, I have used an integer value in the argument declaration. If your ID is not an integer, you have to make appropriate modifications to the function declaration as well as how the value is used in constructing the final SQL (strSQLFinal). Also, if you are grouping by multiple fields, you have to change the VBA appropriately.
The third argument is essentially the order by clause of the final query that is built inside the VBA function. Here you can use any fields you have access to in the SQL statement (the first argument), and use standard SQL syntax regarding ascending sorts, descending sorts, etc. This flexibility to sort the concatenated values by a third field is a big advantage of this approach over the pure SQL approach. The fourth argument is simply the separator that you specify should be used between the concatenated values. You can pass it any string including a space (“ “), a comma (“,”) or even a tab (chr(9)). The passed string will be used verbatim between multiple values that are concatenated. The string does not have to be a single character long though most separators are.
Notice that the function does not take the column names of the fields you want to group by or the field you want to group as arguments. Those field names (bookID, and AuthorName in this example) are hard-coded into the function. As I mentioned earlier, this is not a direct one-to-one replacement for the group_concat() function in Access. Much of the functionality of the function is hard-coded rather than passed in as arguments, so you will need to get your hands dirty and change the VBA code in your application to suit your needs!
The function puts the first three arguments together to build the final SQL statement that is used to open the required recordset. As you can see, the second argument is used to construct the WHERE clause of the final SQL statement so that the recordset contains only rows that need to be grouped together. The third argument puts the rows in the order in which you want the values concatenated (concatenation is not commutative, like addition is. So, we need this extra step).
The rest of the function should be pretty straightforward to read and understand. All we do is open the recordset, and set the value of group_concat to the first value of the field that needs to be grouped. Then inside the loop (as long as there are more values to concatenate), we add a separator and then the next value of the field, and so on until there are no more values to concatenate. The final concatenated string is then passed out as the value returned by the function.
Posted below is some SQL that uses the group_concat() function. Notice that we have now used the authorNumber field to sort the concatenated values, so the final output actually makes sense in this application. After all, it is not often that you need a list of authors for a book arranged in alphabetic order!
select bookID,The output produced by this query is shown in the table below:
group_concat("select bookID, authorName from myTable", bookID, "order by authorNumber",",")
As you can see, since we are selecting all rows of the original table, we still get the same number of rows as in the original table in the output. Each bookID is repeated as many times as it is present in the original table, with the same concatenated set of authors in each row for that bookID. But getting to the final output from this query is quite simple: you can either use a DISTINCT clause to throw out the duplicates that we don’t need, or you can group by the BookID and take the minimum, maximum or average of the Authors field (since the authors field is identical for all rows that contain a given bookID, the result from all three aggregate functions will be identical too).
So, here is the final SQL that gives us the output we want:
select DISTINCT bookID,And here is the final output:
group_concat("select bookID, authorName from myTable", bookID, "order by authorNumber",",") as Authors
The advantages of this approach over the pure SQL approach are as follows:
- Much more flexibility in terms of what fields you can use for sorting the concatenated values
- No multi-self-joins that can cause resource issues and other problems with large tables
- No restriction on how many values can be concatenated, and no need to determine in advance how many values to consider as “sufficient”
- No extra separators when the number of concatenated values is less than the maximum number pre-determined in the SQL query
Hope this post has been helpful in solving any problems you might have had with flattening a table using VBA in Access. The VBA code in this post has been tested in Access 2003 and should work without any problems in all versions of Access from Access 97 on up. If you have any problems or concerns, please feel free to let me know by posting a comment. If you have other questions on Access that you would like me to address in future posts, please feel free to let me know through your comments too. Good luck!