Every industry has begun to channel the power of big data – be it music, sports, or the IT industry.

And the publishing industry is not one to be left behind. Like the music industry, publishing also relies on big-ticket hits.

But predicting bestsellers is not easy. It’s remained an enigmatic art that no one but the sharpest critics and publishing houses can get right – the province of gut instinct and educated guesses.

There are times when these faculties serve the industry well, but when it comes to first-time authors, most invariably get it wrong.

Serious About Success? Don't Settle for Less

Learn 30+ Skills With Our Data Scientist ProgramExplore Program
Serious About Success? Don't Settle for Less

How do you combat this problem?

If only we had a computer algorithm that was able to identify best-selling texts with at least 80% success…

Oh, but we do! Bestseller-o-meter, the subject of an upcoming tome The Bestseller Code: Anatomy of the Blockbuster Novel, by Jodie Archer, an ex-research lead on literature at Apple, and Matthew l. Jockers, an associate professor of English at the University of Nebraska-Lincoln. The algorithm’s claimed result is based on the track record of predicting New York Times’ bestsellers that are applied retrospectively to the novels for the last 30 years.  

The workings – How does this algorithm work?

The bestseller-ometer represents the attempt to identify characteristics of best-selling fiction at scale which can be done by interrogating a huge body of literature – say more than 20,000 novels. This project provides a data-driven check to received wisdom about the secrets behind the bestselling fiction. This, however, also comes with the fear that in the possible future, publishers may just turn to this technology to help them cut through the traditional methods of picking a prospective bestseller.

Learn Everything You Need to Know About Data!

Post Graduate Program In Data EngineeringExplore Course
Learn Everything You Need to Know About Data!

The dawn of an idea – but was it really?

The algorithm that was subsequently built by Jockers and Archer isn’t the first attempt to applying the power of big data to books. A Berlin startup, Inkitt, that was behind what is called the ‘first novel selected by an algorithm’, intensively tracks the reader’s responses to stories that are posted on its web platform in order to identify potential bestsellers.

Founded in 2011, London’s Jellybooks measures the reader engagement in the literary production cycle right before the books are published using a software that is downloaded by the readers in exchange for advance access to a title.

However, how the bestseller-ometer stands apart is by joining the literary scholarship to computational horsepower. The Bestseller Code documents the extreme considerations that went into training the machine to read and unpack the micro-decisions at the level of diction and syntax involved in crafting best-selling fiction.

The algorithms reflect analytical and interpretive choices that are involved in reading one book closely. Features like repetitions, word usage pattern, allusions, and thematic emphases are looked for.

The elements – What the algorithm uses

A few common elements that the algorithm uses are authoritative “voice”; spare, plainspoken, often colloquial, prose, and declarative verbs that connote action-oriented take-charge characters.

The other lesser common elements are something called narrative “cohesion” that Archer and Jockers discovered. Narrative cohesion is a habit that is often used by best-selling authors. For example, John Grisham usually devotes one third of this novels to the signature topic – law and lawyers.

Learn Everything You Need to Know About Data!

Post Graduate Program In Data EngineeringExplore Course
Learn Everything You Need to Know About Data!

The secret to bestselling – according to bestseller-ometer  

There were also confound discoveries made – like sex usually doesn’t sell. In fact it is a distinctly unpopular among the audience and is usually confined to a small portion of the best-selling material.

Take for example Fifty Shades of Grey – the book was heaving with heavy erotic scenes and a plot twist. So this book ideally should not have become a bestseller.

However, Jockers and Archer found that the book’s main theme and subject was human closeness – something that is most prevalent across all bestsellers. Fifty Shades mainly revolved around the idea of emotional intimacy between characters which is what led it to become a bestseller.

The drawback

With bestselling authors like J.K Rowling and John Grisham already taking the stage, publishers are often less inclined to divert their funds to unknown authors. Which is where bestseller-ometer will step in. However, the major concern with this algorithm will be that writers can now write their pieces to satisfy the needs of the algorithm without any literary pedigree.

The situation can either go one way or the other – a good book that deserves to get the bestselling title or a bad book that satisfies the needs of the algorithm but not that of literature.

Become the Highest Paid Data Science Expert

With Our Best-in-class Data Science ProgramExplore Now
Become the Highest Paid Data Science Expert

Conclusion

Like I have said previously, big data is taking over every aspect of the world. With greater use, comes greater demand – in any industry by the looks of it.

Life in big data is always an adventure. Working with data, detecting problems, finding solutions, and predicting the future comes with the job. If you find this industry intriguing and you believe this is the place for you, then take up a certification in big data and begin your journey in this adventurous world.