Author Earnings is an indie author advocacy website run by the anonymous author and tech guru Data Guy with backing from one of the leading success stories of the indie author community, Hugh Howey. In February 2016 the website celebrated its second birthday with the latest of its quarterly reports comparing the earning potentials of different publishing sectors (indie, small publisher, Amazon Publishing, Big Five). This latest report suffers from the same statistical problems that beset the earlier reports, so rather than responding to it this article is about the whole project and will be updated as further reports are published by Author Earnings. One important update will be required in March 2016 when Data Guy speaks in two separate sessions at the trade publishing industry's Digital Book World conference. One of the main industry pundits linked to that conference, Porter Anderson, has written a deeply critical assessment of the latest Author Earnings report. Anderson argues that the Author Earnings data would contribute to the publishing debate better if the data was independently audited. The stock response from Data Guy's fans is that the data has always been downloadable from the Author Earnings website, but that data is one of the main problems. As Anderson comments there is little methodology explained in the Author Earnings reports, although Data Guy has stated that he will present that methodology to the conference.
A key aspect of any large-scale survey is knowing how the data has been selected, but to find out any methodology from Author Earnings you have to ferret out hints here and there from the reports. This is a major failing and without a clear statement of methodology the data crunching is of limited use. I suspect that the DBW conference will not enlighten us much as Data Guy has had two years of complaints about the lack of a clear methodology and despite being increasingly vociferous about his research has yet to tell us much beyond how many computers he uses to extract information from Amazon's servers.
Data Guy uses a server farm of 250 computers to run the software data spider that gleans information from Amazon's servers. This is the only way to gain information about e-book sales on Amazon as the retailer does not release sales figures. The processing power and technical wizardry in programming the spider is impressive, but the technology must not blind the reports' readers to the reality that the spider does what it is told by one human in consultation with one other human. The data is collected on the basis of what serves the aims of the advocacy project, which is to present a compelling case for self-publishing rather than going through an agent to seek a contract with one of the major publishers. There were two reports produced two years ago: a look at 7000 e-books in a select range of genres and then a larger trawl of 54,000 books. Howey wrote those first two reports and his opening paragraphs of the second report reveal the major failing of the Author Earnings spider. It was instructed to glean sales rankings from most bestseller lists on Amazon.com, which resulted in 60% of the books being non-fiction. Howey explained that this was because of the multiple sub-categories in non-fiction and does not reflect the proportion of sales. Yet those figures are then used to make highly detailed claims about the earning power of the different publishing sectors, despite the data not reflecting sales.
The Author Earnings reports make grand claims about the proportion of bestsellers on Amazon.com that are from indie publishers (their term for self-publishers) as opposed to the proportion sold by the Big Five trade publishers (Penguin Random House, Macmillan, HarperCollins, Simon and Schuster, and Hachette). The problem is that they are using bestseller to refer to the top 100 books in the Kindle Store's category lists and their headline grabbing claims seldom reference the fact that they are not referring to the bestselling books on Amazon. This category focus is by choice as Howey mentions in the first report that Data Guy analyzed the top 1000 books in the overall bestseller lists in order to establish the bestselling genres as being Mystery/Thriller, Science Fiction and Fantasy, and Romance. Despite claims from Data Guy's fans that the spider can only access the top 100 books in each category we were told that it could drill down into the overall bestseller list. So Howey and Data Guy have chosen to glean their data from the category lists when they could go down the overall bestseller list. That is why the data downloads are of little use for independent researchers - Author Earnings have already biased the results in terms of how they programmed their spider.
The choice to use category lists rather than the overall bestseller list renders all Author Earnings claims about proportions of sales by industry sector suspect. A book from a Big Five publisher currently sitting at 104 in Thrillers is likely to have sold more copies that day than a book ranked 97 in Asian American Literature, a category in which a single sale of a newly published book can take a book into the top ten or twenty. The results are further skewed by the tendency of trade publishers to keep to the industry standard BISAC categories, leaving indie authors to dominate non-BISAC categories, such as Space Exploration. Howey gives details of this when he explains how hard it was to persuade his print publisher Simon and Schuster to categorize his bestselling Wool in a science fiction sub-category. Trade published books will often top that category as they are added to it by Amazon's software robots (at the time of writing Andy Weir's The Martian is number one), but much of the top 100 in Space Exploration is self-published. This means that the proportion of bestsellers that are self-published will be over-reported.
Kindle Store Sales Rankings
Author Earnings reports are based around the top 100 lists across most of the categories in the Kindle Store and occasionally also the Amazon Book Store if print sales are analyzed. In the absence of sales figures from Amazon these bestseller lists are the only way to publicly assess how well a book is doing. The publisher will have access to accurate sales figures, but Amazon reveals nothing about eBook sales to anyone else, not even the likes of the New York Times Bestseller List. Another thing that Amazon reveals to no-one is what factors they take into account in calculating these category top 100 lists. Anecdotal evidence from self-publishers can build up a certain picture, but no-one but those in the know at Amazon are in the know. What is known is that Amazon gives preferential ranking to newly published books, which explains why a single sale can take a book into the top ten of an obscure category, but a single sale of the same book 12 months later will not make the top 100. That enables the bestseller lists to serve as customer information about what is newly available, so it makes sense from a business perspective. It is also known that there is a difference between the sales required to hit a high ranking and the sales required to maintain that ranking, which Data Guy acknowledged in the February 2016 report, referring to this article from an indie publisher. It is also known that Amazon likes to see a churn, so that a book with reducing sales falls down the lists faster at Amazon than at other retailers. All of those known factors create difficulties for the Author Earnings reliance on Amazon sales rankings.
Amazon is not only the biggest eBook retailer in the United States, but also a major publisher. Indeed when it comes to the Kindle Store it is not accurate to talk of a Big Five as Amazon Publishing outsells all publishers in their own store except for Penguin Random House. That begs the question as to why the rhetoric of the Author Earnings reports are about indie publishers against the Big Five and not the Big Six. Their charts almost always have Amazon Publishing in a separate category, which sometimes is the difference between a claim that the BIg Whatever is less or more than 50% of the market. As no-one outside Amazon knows how they calculate their bestseller lists we do not know if they are giving a little (or big) ranking boost to their own books. As promotional emails to customers usually contain a high percentage of recommendations for Amazon Publishing books it may not be necessary to fix the figures in their own favour, but we do not know whether they are doing so because we do not know how they calculate bestseller status.
One of the main promotional outlets for Amazon Publishing is their Kindle First programme. This lets members of Amazon Prime (a paid-for premium shopper status) choose a free eBook each month from a choice of usually four titles that will not be officially published until the following month. Recently the programme has been expanded to allow any Amazon customer to buy one of these Kindle First books for $1.99. This gives a major ranking boost to these books, which often dominate the overall bestseller list, especially at the start of the month when the Kindle First list is announced. That gives Amazon Publishing a major boost in the Author Earnings data and raises further questions as to why their charts separate out Amazon Publishing from the Big Five, four of whom they outsell.
Amazon launched an eBook subscription service called Kindle Unlimited in July 2014, five months after the first Author Earnings reports. For $9.99 it allows members to read as many books as they want from authors who publish exclusively with Amazon plus a small curated selection of non-exclusive titles. To encourage indie publishers to go exclusive with Amazon they claim to boost a book's ranking equivalent to a sale for every Kindle Unlimited member that downloads it. Data Guy has been in denial about how this renders any calculation based on sales rank meaningless. An indie publisher was paid in the initial version of Kindle Unlimited for every time a book was read over the 10% mark and since November 2015 they have been paid a small fee for each page read. If a Kindle Unlimited user returns the book unread the publisher receives no money, but retains the sales rank boost for the download. To further complicate matters sales income can no longer be calculated from rankings, because Kindle Unlimited reads have never been paid at the level of a sale. So a book ranked 19 on the day the Author Earnings spider crawls past may have much less income than a book ranked 29 because of a difference between borrows and sales. Yet in the latest report Data Guy boasts of the amazing accuracy of the predictions when tested against the actual sales figures of some self-published authors who provided their sales details to him. At that point he moved from being in denial about the impact of Kindle Unlimited to presenting fraudulent data.
Data Guy regularly refines the method of his calculations and has collected a varying amount of data over the history of the reports, yet he still makes comparisons over time, despite the failure to keep the data model consistent.
There are a lot of major problems with the Author Earnings data and interpretations of it in the reports. The Digital Book World conference sessions will be very interesting. It is a shame that I cannot attend it, but I am sure that someone such as Porter Anderson will take notes.