MQ: "Made-up-ness" quotient of SEC filings

Valerie Aurora

Bringing financial analysis to the masses

Note: Now happily superceded by FreeRisk.org.

One day, while carefully and diligently filling out my expense report with the exact, actual, and true cost of my lunch, plus tip up to but not exceeding 15% of the total bill, I got to thinking about how actual random numbers and human-made "random" numbers differ. For example, in actual random numbers, the numbers (with dollar signs just to make this interesting):

$10.27
$22.22
Would be equally likely to appear, but in human-made "random" numbers, you would probably never see $22.22, because it doesn't look "random" to a human.

Then I thought about companies that "massage" their financial reports to meet targets, or outright fake their reports to defraud investors. Presumably, when they are cooking their books, most crooks don't bother to use a high quality random generator and instead just make stuff up, human-style. Wouldn't this result in an abnormal distribution of numbers that could be detected via statistical analysis? While this has been done many times before by professional accountants and auditors in specific cases, it would be nice to have a publicly available tool that ordinary humans can use.

So, at SuperHappyDevHouse 9, I decided to write a set of scripts to automatically grab SEC filings (10-Q and 10-K), strip out all the numbers, and then run statistical tests. After some mucking about I found that (a) there is no standard format for SEC filing data, (b) automatically processing it is a known difficult problem with about a 30% error rate, (c) Google Finance spits out a nice set of uniformly formatted financial data which is perfect for my use. Once I know the company id, a little wget action and some sed/awk/bash/perl scripting is all I need (later replaced with python).

The first test is one David Weekly suggested: Record the distribution of leading digits. The distribution should follow Benford's Law, which (criminal simplification alert) basically says that the first digits of numbers built out of lots of little real-world numbers should fall into a logarithmic distribution; specifically log10(1 + 1/N), where N is the first digit.

The below graph shows my initial analysis.

For higher resolution see the original PS or converted PDF.

The smooth continuous line shows Benford's Law; real data should stay pretty close to this line, as do most of the companies graphed here. The number of samples varies from about 2000 to 5000 samples; after screening out identical numbers it comes down to about 300 unique samples per company.

One of the problems with this analysis is that many of the numbers in the Google Finance data are derived from each other. For example, the same number can be reported every quarter for a certain type of asset that doesn't change; other numbers are sums of previous numbers. I filtered out the obvious - identical numbers - but the others are harder. On the other hand, we're really looking for differences between companies rather than deviations from the theoretical ideal, so as long as the bias is consistent, it may not be a big deal. Changing the script to filter out duplicate numbers did not change the overall appearance of the graphs much, suggesting we shouldn't worry too much about the input data as long as we have a lot of it.

The MQ Calculator

I have finally completed a prototype that allows you to type in a company ID and get its MQ.

The MQ Calculator Removed due to bit rot

I would really like someone to take this and turn it into an open source collection of statistical tests. (I don't have time, sadly.)

Acknowledgements

Thanks to David Weekly, Patti Ames, Ka-Ping Yee, Ellen Cousins, and Brian Warner for their help.

DISCLAIMER: This is a toy. Most of the data is probably wrong. If you make investment decisions based on this data you are a durned fool and deserve to lose your money to someone who can make better use of it. You should absolutely not draw conclusions about the law-abidingness or lack thereof of any company based on a few flimsy bug-ridden scripts. Shame on you for even thinking about taking this seriously.


Back to main page