Compiling dynamic Excel formulas to static references

“Hey, could you help me on this financial spreadsheet?”, my friend asked.

“Sure”, I respond. I may have forgotten my finance classes, but I’m okay with Excel.

The assignment: some blocks of formulas needed to be duplicated into multiple blocks, but if copy-pasted the formulas would no longer be right. Excel by default isn’t good with this; sure, you have $A$1 vs. A1, but multiple blocks of essentially the same logic means a formula should be able to generalize in one way across the Y-axis within a block, yet in another way across the same axis to generalize between the top-left cell between blocks 1 and 2 (if also arranged vertically). Removing that $ won’t suffice anymore.

At this point I’m very much inclined to attempt redoing the whole thing in PowerPivot, but the model seems a bit big, so might as well stick with what we had. I somehow managed to generalize things, ending up with formulas like this:

=INDIRECT("F"&$B23) * $F17 * INDIRECT($C23&":"&$C23)

vs. the original:

=$F$11 * $F13 * J$12

.. among other gems such as this one:

=INDEX(Revenues!$A:$ZY,$D70+$B70,COLUMNS($A$1:J$1))

This somehow ended up dynamic enough, so content, I send it back.

“Great!”, my friend said. “Except could you make the formulas back the way they were again? I know it’s probably a lot of manual editing and copy pasting, but financial modeling standards demand transparency.”

“Then why not also show the old static version for their check?”, I ask, defensively. “Static formulas won’t scale, and trying to paste/fix using VBA for each sheet structure will be horrible. Either tell them to accept PowerPivot, or let these econometrists waste away their own time manually editing all day if that’s what they’d prefer.”

“Sorry, finance people don’t ‘get’ Excel, they’d rather just have simple formulas without complicated lookups”, my friend explains.

“Perhaps they shouldn’t be in finance then”, I rebuke in a last desperate struggle. “I thought ‘econometrist’ sounded smarter than someone copy-pasting all day!”

But the damage was done; my dynamic Excel formulas rejected. I could still rebuild everything in PowerPivot, but felt courageous enough to go find a way to convert my dynamic formulas back into simple ones.

I figured it out after discovering an existing formula parser; the result of my efforts can be downloaded here (just download as it’s about the VBA). The meat is in the CompileReferences sub — if ran, it converts formulas with the likes of INDEX and INDIRECT back into simple, static, direct cell references. If run, you end up with the simplified static version of the file, for my sample case here.

A paste of the main VBA can be found here, though it also requires regXLib and cregXLib from here (plus the corresponding reference to Microsoft VBScript Regular Expressions 5.5).

My testing has been limited to this one spreadsheet so far, and PowerPivot conversion is still my next step, but perhaps this may be of interest to others stuck with the dilemma of dynamic Excel formulas vs. ‘financial modeling standards’ grade transparency.

Posted in Uncategorized | Leave a comment

Sharing Power Query functions — the ‘library on Github’ way

For 1.5 years of interesting spreadsheets and related tricks I’ve (largely) neglected to post about any. I’m among a billion voices, so for the most part adding to the noise seems useless. This time I found something worth spreading.

I’ve been pretty evangelic about Excel’s shiny recent plugins (their names start with ‘Power’), and recently learned something cool about Power Query. I’d known of the ability to dynamically load M code from a text file, which seemed interesting. It was only a couple days ago that I found out Power Query also allows defining reusable functions though. This was exciting to me because it would allow the kind of modular kind of coding I liked of Excel VBA — solve a problem once, make the solution into a new function, problem solved (permanently).

I then revisited Chris’s mentioned blog post, and found this thread on how to dynamically load code for such user-defined functions from a file. This combination seemed more than significant to me, as it would bring this modular coding to Power Query as well. Find yourself using the same three functions in a row a few times? Make it a new function and save some lines each time! Beats being stuck with nothing but those regular functions.

With this in mind, I externalized a couple functions I had / could find online (yeah mostly Chris Webb’s blog) into separate files, and put them into a Github repository so others could use them as well, and hopefully add new functions. Further explanation in there.

 

Posted in Uncategorized | Tagged | 4 Comments

JSON feeds in Excel: importing exchange rate data for currency conversion

Importing your data is half the work, and while my previous post was about importing XML data (web pages), Excel’s new Webservice function can also be used to import data in JSON, commonly used for data feeds. This is also the format used by the Open Exchange Rates API, which I used to this end.

There is no native functionality for parsing data in this format in Excel unfortunately, though fortunately the bulk of the work has already been done by Bruce McPherson’s Rest to Excel library. The options to populate an Excel sheet with this were still limited though, throwing an “input json object not defined” error for any feed that doesn’t fit its structure assumptions. Here is the function I used to fix this (using the cJobject class from McPherson’s cdataset.xlsm):

Public Sub testPopulate()
Set cj = New cJobject
Set cj = cj.deSerialize(Range("json"))
'Debug.Print cj.serialize
'Debug.Print cj.formatData
Call cJObjectPopulate(cj, rstart:=Range("A6"), i:=1, j:=1)
End Sub

Private Sub cJObjectPopulate(cj As Variant, rstart As Range, ByRef i As Integer, j As Integer)
For Each child In cj.children
rstart.Cells(i, j).value = child.Key
i = i + 1
If child.hasChildren Then
Call cJObjectPopulate(child, rstart, i, j + 1)
Else: rstart.Cells(i - 1, j + 1).value = child.value
End If
Next child
End Sub

In short, get that cJobject class and these VBA functions, import the data feed in question (i.e. =WEBSERVICE(“http://openexchangerates.org/api/latest.json?app_id=0000000000000000000000000000”), replacing the zeroes with your API key), name the cell containing the JSON data ‘json’, and run the above macros to populate the sheet with its contents (starting from cell A6 in this case).

At this point, all the data was ready in my newly populated spreadsheet. The next step? Ctrl-Shift-F3, creating named ranges for all the data fields based on the currency identifiers:

Screenshot (4)

So now all the data is in place in an accessible way — ready to be used. So all that’s left is to add a currency conversion function:

Function CurrencyConv(amount As Double, ByVal cur1, ByVal cur2) As Double
If Not IsNumeric(cur1) Then cur1 = Range(cur1) 'if string
If Not IsNumeric(cur2) Then cur2 = Range(cur2) 'if string
CurrencyConv = Round(amount * cur2 / cur1, 2)
End Function

At this point you can just enter =CurrencyConv(10,EUR,USD), and it’ll do the math for you. In many cases however, I’d have my currencies stored in separate cells from the amounts, and would want to refer to those cells instead. In that case this function transforms the currency strings contained in those cells (still like ‘EUR’, without the quotes) into cell references, taking the values from the cells bearing those names.

Screenshot (5)

… so this’d work fine too.

Bonus trick to impress your boss:

Currency conversion is cool, but including the cell formatting as well makes it even more impressive. In the appropriate sheet, I put in currency drop-down boxes using data validation lists, then created a named range ‘curr’ for all currency cells that should change to the desired currency. After that, include the following VBA sub in the sheet in question:

Private Sub Worksheet_Change(ByVal Target As Range)
If Not Intersect(Target, Me.Range("curr")) Is Nothing Then Call formatCurrencies(Intersect(Target, Me.Range("curr")))
End Sub

This will run the formatCurrencies sub in case any of the cells within the ‘curr’ range change (telling the function which ones did). Then, finally, we should adjust the format of the cells in question to display the correct currency symbol (in this case taken from the cell named ‘currency’), with an example for the currencies I used here:

Public Sub formatCurrencies(r as range)
Dim curr As String
curr = Range("currency").Value
Select Case curr
Case "USD" r.NumberFormat = "_([$$-2] * #,##0.00_);_([$$-2] * (#,##0.00);_([$$-2] * ""-""??_);_(@_)"
Case "HKD" r.NumberFormat = "_([$HK$-2] * #,##0.00_);_([$HK$-2] * (#,##0.00);_([$HK$-2] * ""-""??_);_(@_)"
Case "EUR" r.NumberFormat = "_([$€-2] * #,##0.00_);_([$€-2] * (#,##0.00);_([$€-2] * ""-""??_);_(@_)"
Case "CNY", "RMB", "JPY" r.NumberFormat = "_([$£¤-2] * #,##0.00_);_([$£¤-2] * (#,##0.00);_([$£¤-2] * ""-""??_);_(@_)"
Case Else
r.NumberFormat = "0.00"
End Select
End Sub

.. and that’s it! This allows you to easily use up-to-date exchange rates for currency conversion in Excel, as well as easily changing desired display currency to make Excel update both the afflicted values as well as their currency symbols.

Anyone else had experience with currency conversion or JSON in their sheets?

Posted in Uncategorized | Tagged , , , , , , | 6 Comments

Getting data with Excel 2013: using FilterXML for web scraping from Alibaba

As a spreadsheet geek, I was thrilled to find out about the new functionality in Excel 2013, most notably its new webservice and filterXML functions, which seemed meant to put it back on par with Google Docs’ much-acclaimed ImportXML function. Because getting the data where you want it (in your spreadsheet) is half the work.

Unfortunately, both GDocs’ ImportXML and Excel’s Webservice and FilterXML functions have their flaws. GDocs’ ImportXML doesn’t fully support XPath (1.0) — it doesn’t allow you to pick out a second div using /div[2], instead returning a block of anything matching what you asked for, making it hard to get only the parts you wanted exactly where you wanted them. Worse yet, you can only use such functions 50 times per sheet at the time of writing, while there is no native way to import a web page in one cell to parse its contents from other cells either. (Note: neither GDocs nor Excel support scraping from websites requiring authentication, for which you’re better off with Ruby gem Watir.) As such, I turned towards Excel instead.

Challenges in Excel included the following:

  • Any of the existing ‘import data’ functions did not seem scaleable for targeted scraping.
  • Excel currently limits the number of characters per cell to 32,767 characters. That’d seem like a lot, but the web pages I wanted to store in them were more around 77k characters. If you try anyway, you’ll be welcomed by a puzzling #VALUE error.
  • Unlike GDocs’ ImportXML function, Excel’s FilterXML function is pretty picky. Does your web page contain an XML error somewhere? Enjoy your undescriptive #VALUE error; good luck figuring out why you got it!
  • For me using attribute names in the XPath queries (e.g. /div[@class=’stuff’]) did not work on the actual pages I wanted to scrape. Not sure what caused it, would be glad to hear others’ experiences about this.
  • Worse yet, other blogs covering these new functions only seemed to casually go over them, not mentioning these issues or potential workarounds.

The Excel VBA functions (parts and bit assembled from around the internet, then ) I used to overcome these issues were as follows. The showXML/fixXML functions were mine, others probably heavily borrow from code snippets on StackOverflow.

Function importXML(url As String, XPath As String) As String ', Optional trunc As Boolean = False) As String
importXML = WorksheetFunction.FilterXML(showXML(url), XPath)
End Function

Function showXML(url As String, Optional trunc As Boolean = False) As String
Dim s As String
s = ShowHTML(url)
'clean up unnecessary stuff to avoid exceeding the 32k character limit
s = regReplace("", s, "")
s = regReplace("", s, "")
s = regReplace("[a-zA-Z0-9_\-]+=['""]{(.)*?}['""]", s, "")
s = regReplace("[\s]+", s, " ")
s = regReplace("\s*([<>=])\s*", s, "$1")
s = regReplace("<script(.)*?/script>", s, "")
s = regReplace("<meta([^>]*?/>|(.)*?)", s, "")
s = regReplace("<link([^>]*?/>|(.)*?)", s, "")
s = regReplace("<style([^>]*?/>|(.)*?)", s, "")
s = regReplace("<noscript([^>]*?/>|(.)*?)", s, "")
s = regReplace("<header([^>]*?/>|(.)*?)", s, "")
s = regReplace("<footer([^>]*?/>|(.)*?)", s, "")
s = regReplace(" style\s*=\s*(""(.)*?""|'(.)*?')", s, "")
s = regReplace(" title\s*=\s*(""(.)*?""|'(.)*?')", s, "")
s = regReplace(" alt\s*=\s*(""(.)*?""|'(.)*?')", s, "")
s = regReplace(" rel\s*=\s*(""(.)*?""|'(.)*?')", s, "")
s = regReplace(" target\s*=\s*(""(.)*?""|'(.)*?')", s, "")
s = regReplace(" type\s*=\s*(""(.)*?""|'(.)*?')", s, "")
s = regReplace(" data-[a-zA-Z_\-]*?\s*=\s*(""(.)*?""|'(.)*?')", s, "")
s = Replace(s, "&", "&")
s = fixXML(s)
's = WorksheetFunction.FilterXML(s, XPath)
If (trunc = True) Then s = truncate(s)
showXML = s
End Function

Public Function ShowHTML(strURL As String, Optional trunc As Boolean = False) As String
Dim oXMLHTTP As New MSXML2.XMLHTTP
With oXMLHTTP
.Open "GET", strURL, False
.send ""
If .Status = 200 Then
ShowHTML = .responseText
Else
ShowHTML = .statusText
End If
End With
ShowHTML = Trim(ShowHTML)
Set oXMLHTTP = Nothing
If trunc Then ShowHTML = truncate(ShowHTML)
End Function

Public Function fixXML(s As String) As String 'fix malformed XML
Dim i As Long, toPos As Long, openClose As Long, kind As String, matches As Variant, varKind As String, startAttr As Long, layer As Integer, attrs As New Scripting.Dictionary, tagStack As New Scripting.Dictionary, deepestOcc As String, j As Integer, spacePos As Long
i = 0
layer = 0
Do
i = InStr(i + 1, s, "<") 'start tag
If i < 1 Then Exit Do 'no more tags, all done 'kill superfluous tags openClose = InStr(i, s, ">") 'end tag
soFar = Left(s, openClose - 1)
If (Mid(s, i + 1, 1) = "/") Then 'closing tag
kind = Trim(Mid(s, i + 2, openClose - i - 2)) 'tag type
deepestOcc = GetKey(tagStack, kind, True)
If (deepestOcc = "") Then 'if no matches
Debug.Print (String(layer, " ") & "-" & kind & " (removed)")
s = Left(s, i - 1) & Mid(s, openClose + 1) 'remove superfluous closing tag
i = i - 1
Else '(assumes forgotten close -- dangerous assumption)
Do While (layer <> CInt(deepestOcc))
s = Left(s, i - 1) & "" & Mid(s, i) 'append forgotten closes
i = i + Len(tagStack.Item(layer)) + 3
Debug.Print (String(layer, " ") & "-" & tagStack.Item(layer) & " (forced)")
tagStack.Remove (layer) 'remove tag from hierarchy stack
layer = layer - 1
Loop
'If kind = tagStack.Item(layer) Then 'the close tag matches
Debug.Print (String(layer, " ") & "-" & kind)
tagStack.Remove (layer) 'remove tag from hierarchy stack
layer = layer - 1
'End If
End If
Else 'opening tag, remove if unclosed
If InStr(i, s, "/>") > 0 And InStr(i, s, "/>") < openClose Then GoTo nextIteration 'empty tags don't need closing
spacePos = InStr(i, s, " ")
If spacePos < 1 Then spacePos = 65000 'should exceed document size
kind = Trim(Mid(s, i + 1, WorksheetFunction.Min(spacePos, openClose - 1) - i)) 'tag type
layer = layer + 1
Debug.Print (String(layer, " ") & "+" & Mid(s, i + 1, openClose - i - 1))
Call tagStack.Add(layer, kind)
'If InStr(i, s, " 0) 'close remaining tags on stack
Debug.Print (String(layer, " ") & "-" & tagStack.Item(layer) & " (forced cleanup)")
s = s & "" 'append forgotten closes
tagStack.Remove (layer) 'remove tag from hierarchy stack
layer = layer - 1
Loop
fixXML = s
End Function

Public Function truncate(ByRef s As String)
Const limit As Integer = 32767
If (Len(s) > limit) Then
MsgBox Len(s) & " > " & limit & ", truncating!"
s = Left(s, limit)
End If
truncate = s
End Function

Public Function regReplace(patt As String, inText As String, withText As String) As String
Dim regEx
Set regEx = CreateObject("vbscript.regexp")
With regEx
.MultiLine = True
.Pattern = patt
.Global = True
End With
regReplace = regEx.Replace(inText, withText)
End Function

Public Function getMatches(s As String, reg As String) As Variant
Dim regEx
Set regEx = New RegExp
regEx.Pattern = reg
regEx.ignorecase = False
regEx.Global = True
Set getMatches = regEx.Execute(s)
End Function

Function GetKey(dic As Scripting.Dictionary, strItem As String, Optional last As Boolean = False) As String
Dim key
GetKey = ""
For Each key In dic.keys()
If dic.Item(key) = strItem Then
GetKey = key
If (last = False) Then Exit Function 'sucks, still asc.
End If
Next
End Function

Usage guide: use the above ShowXML function to import the source of a web page where the Webservice function fails (setting the optional truncate parameter to True where necessary to dodge the 32k character limit), then pick out the good snippets from that using FilterXML. My sample spreadsheet can be found here.

The result? I managed to use as an input only one Alibaba product listing URL for a page like this:

Screenshot (2)

… and use it to scrape all product details for all its listed products from their individual pages, like this one:

Screenshot (3)

I guess the next step for me would to write a VBA function to automate scraping a bunch of links this way, then dumping the contents in one big table. So far the only issue for me was the clunky XPath though. Has anyone else figured out how to fix the errors for attribute references (forcing me to use ten /div’s instead of the more desirable “//a[@class=’mc-click-target’]/@href”)?

Posted in Uncategorized | Tagged , , , , , , , | 8 Comments