Predicting My Next Favorite Book 📚 Thanks to Goodreads’ Data and Machine Learning 👨‍💻

Simon Pastor
11 min readJun 21, 2021

“If you don’t like to read, you haven’t found the right book.” — J.K. Rowling

This article is quite different than the ones you might be used to. I really enjoyed working on it and sharing it with you, I hope you’ll enjoy reading about it 🙂

If you want to discover more projects I’ve been working on, feel free to check out my website!

I recently realized that I didn’t spend enough time reading. So last year I started challenging myself to read more. It’s around that time that I came across Goodreads, a social network for books. Thanks to this platform, you can see what your friends are reading, have read, how they graded various books etc. You can also record all books you’ve read, save books you want to read and so much more.

@roadtripwithraj

A few weeks ago, I discovered that I could download my Goodreads data. As a data-enthusiast, I thought it would be fun to have a look at it and see if I could use it to build personalized book recommendations! Here’s how I dit it 👇

Part 1 — Having Fun With The Data

The data I downloaded contained 441 books. It included the ones I had read (265), the ones I’m currently reading (3) and those I want to read (173). For each book it included all (or almost all) of the following details: Book id, Title, Author, ISBN, My Rating, Average Rating (given by all Goodreads members), Publisher, Number of Pages, Year Published.

So I started having a bit of fun with the data. Here’s a graph showing the Number of books in my Library per Author. As you can see, childhood book authors René Goscinny (Asterix), Mary Pope Osborne (Magic Tree House), Robert Muchamore (CHERUB) and Anthony Horowitz (Alex Rider) lead the ranking. They are followed by “adult” authors such as Kundera, Camus, Molière and Huxley.

Another interesting graph I was able to display was the number of books in my library per year published.

Simon Pastor

🇪🇺🇫🇷🇺🇸 @YaleMPP • Previousy @LSEGovernment, Institut Montaigne, CitizenLab, UN 🇺🇳