Geschlechterstatistiken mit der Goodreads API

Inspiriert von diesem Tweet zur Geschlechterverteilung gelesener Bücher wollte ich als kleinen Nachtrag zum Lesestoff 2019 herausfinden, wie es da bei mir so aussieht – ich habe darüber unter dem Jahr nämlich auch schon ein paar Mal nachgedacht.

tl;dr: Ziemlich bescheiden, 14 Prozent – auf maximal 44 “schönrechenbar”.

Nun habe ich mit insgesamt 44 Büchern letztes Jahr nicht so viel gelesen, dass manuelles Auszählen übermäßig aufwendig wäre… aber das ist ja langweilig, wozu hat Goodreads schließlich eine API. Das goodreads-gem kapselt diese, so dass – nach einem kurzen Besuch in den Settings, um einen eigenen API-Key zu erzeugen – wenige Zeilen Ruby-Code genügen, um die entsprechenden Daten zu bekommen:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
require 'goodreads'

my_api_key = 'abc'
my_api_secret = 'xyz'
my_user_id = 123 # the numeric value in your profile URL

# initialize API wrapper
client = Goodreads::Client.new(api_key: my_api_key, api_secret: my_api_secret)

page_number = 1
last_page_reached = false

# we only receive 10 books per page, so we
# need a loop to go through all read books
while not last_page_reached
  # this call returns the 10 books, also how many books there are
  # in that shelf in total, and at which book the current page ends
  readshelf = client.shelf(my_user_id, "read", page: page_number)

  # the 10 books are in the "books" array, so we iterate over that
  readshelf.books.each do |shelf_entry|
    # but the array actually contains "shelf entries", not books.
    # makes sense, because a book could be read multiple times, so
    # having only one "read_at" value per book would not make sense
    # and instead that information is stored in the shelf entry
    book = shelf_entry.book
    author = client.author(book.authors.author.id)
   
    # turns out that gender is not set for quite a lot of authors
    gender = author.gender || "unknown"

    # turns out you can mark books as read without giving a date
    read_in_year = shelf_entry.read_at ? shelf_entry.read_at.split(//).last(4).join : 'XXXX'

    # output the information of the current book
    print read_in_year + ", " + gender + ": " + book.title + " (" + author.name + ")" + "\n"
  end

  # repeat with the next page or stop processing after the last book
  if readshelf.end < readshelf.total
    page_number = page_number + 1
  else
    last_page_reached = true
  end
end

“Überraschenderweise” sind die Daten von Goodreads relativ miserabel und auch für sehr populäre Autoren ist kein Geschlecht hinterlegt. Hier ist der Output des Skripts (von mir per Texteditor sortiert und dann auf 2019 begrenzt):

2019, female: A Conjuring of Light (Shades of Magic, #3) (V.E. Schwab)
2019, female: A Darker Shade of Magic (Shades of Magic, #1) (V.E. Schwab)
2019, female: A Gathering of Shadows (Shades of Magic, #2) (V.E. Schwab)
2019, female: The Ethical Slut: A Guide to Infinite Sexual Possibilities (Dossie Easton)
2019, male: And Another Thing... (Hitchhiker's Guide to the Galaxy, #6) (Eoin Colfer)
2019, male: Baptism of Fire (The Witcher #3) (Andrzej Sapkowski)
2019, male: Blood of Elves (Witcher, #1) (Andrzej Sapkowski)
2019, male: City of Blades (The Divine Cities, #2) (Robert Jackson Bennett)
2019, male: City of Miracles (The Divine Cities, #3) (Robert Jackson Bennett)
2019, male: City of Stairs (The Divine Cities, #1) (Robert Jackson Bennett)
2019, male: Digital Minimalism: Choosing a Focused Life in a Noisy World (Cal Newport)
2019, male: Foundryside (Founders, #1) (Robert Jackson Bennett)
2019, male: Lady of the Lake (The Witcher, #7) (Andrzej Sapkowski)
2019, male: Life, the Universe and Everything (Hitchhiker's Guide to the Galaxy, #3) (Douglas Adams)
2019, male: Mostly Harmless (Hitchhiker's Guide to the Galaxy, #5) (Douglas Adams)
2019, male: Scott Pilgrim, Vol. 5: Scott Pilgrim vs. the Universe (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Vol. 6: Scott Pilgrim's Finest Hour (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Volume 1: Scott Pilgrim's Precious Little Life (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Volume 2: Scott Pilgrim vs. The World (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Volume 3: Scott Pilgrim & The Infinite Sadness (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Volume 4: Scott Pilgrim Gets It Together (Bryan Lee O'Malley)
2019, male: Season of Storms (The Witcher #0) (Andrzej Sapkowski)
2019, male: So Long, and Thanks for All the Fish (Hitchhiker's Guide to the Galaxy, #4) (Douglas Adams)
2019, male: Sword of Destiny (The Witcher, #0.75) (Andrzej Sapkowski)
2019, male: The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide to the Galaxy, #1) (Douglas Adams)
2019, male: The Last Wish (The Witcher, #0.5) (Andrzej Sapkowski)
2019, male: The Restaurant at the End of the Universe (Hitchhiker's Guide to the Galaxy, #2) (Douglas Adams)
2019, male: The Time of Contempt (The Witcher, #2) (Andrzej Sapkowski)
2019, male: The Tower of the Swallow (The Witcher, #4) (Andrzej Sapkowski)
2019, unknown: Daring Greatly: How the Courage to Be Vulnerable Transforms the Way We Live, Love, Parent, and Lead (Brené Brown)
2019, unknown: Manifest gegen die Arbeit (Gruppe Krisis)
2019, unknown: The Stone Sky (The Broken Earth, #3) (N.K. Jemisin)
2019, unknown: Watchmen #10: Two Riders Were Approaching (Alan Moore)
2019, unknown: Watchmen #11: Look upon my works, Ye mighty... (Alan Moore)
2019, unknown: Watchmen #12: A Stronger Loving World (Alan Moore)
2019, unknown: Watchmen #1: At Midnight, All The Agents.... (Alan Moore)
2019, unknown: Watchmen #2: Absent Friends (Alan Moore)
2019, unknown: Watchmen #3: The Judge Of All The Earth (Alan Moore)
2019, unknown: Watchmen #4: Watchmaker (Alan Moore)
2019, unknown: Watchmen #5: Fearful Symmetry (Alan Moore)
2019, unknown: Watchmen #6: The Abyss Gazes Also (Alan Moore)
2019, unknown: Watchmen #7: A Brother To Dragons (Alan Moore)
2019, unknown: Watchmen #8: Old Ghosts (Alan Moore)
2019, unknown: Watchmen #9: The Darkness of Mere Being (Alan Moore)

Also kurz1 die ganzen “unknown”-Einträge manuell recherchieren und nachtragen, dann sieht es so aus:

2019, female: A Conjuring of Light (Shades of Magic, #3) (V.E. Schwab)
2019, female: A Darker Shade of Magic (Shades of Magic, #1) (V.E. Schwab)
2019, female: A Gathering of Shadows (Shades of Magic, #2) (V.E. Schwab)
2019, female: The Ethical Slut: A Guide to Infinite Sexual Possibilities (Dossie Easton)
2019, male: And Another Thing... (Hitchhiker's Guide to the Galaxy, #6) (Eoin Colfer)
2019, male: Baptism of Fire (The Witcher #3) (Andrzej Sapkowski)
2019, male: Blood of Elves (Witcher, #1) (Andrzej Sapkowski)
2019, male: City of Blades (The Divine Cities, #2) (Robert Jackson Bennett)
2019, male: City of Miracles (The Divine Cities, #3) (Robert Jackson Bennett)
2019, male: City of Stairs (The Divine Cities, #1) (Robert Jackson Bennett)
2019, male: Digital Minimalism: Choosing a Focused Life in a Noisy World (Cal Newport)
2019, male: Foundryside (Founders, #1) (Robert Jackson Bennett)
2019, male: Lady of the Lake (The Witcher, #7) (Andrzej Sapkowski)
2019, male: Life, the Universe and Everything (Hitchhiker's Guide to the Galaxy, #3) (Douglas Adams)
2019, male: Mostly Harmless (Hitchhiker's Guide to the Galaxy, #5) (Douglas Adams)
2019, male: Scott Pilgrim, Vol. 5: Scott Pilgrim vs. the Universe (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Vol. 6: Scott Pilgrim's Finest Hour (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Volume 1: Scott Pilgrim's Precious Little Life (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Volume 2: Scott Pilgrim vs. The World (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Volume 3: Scott Pilgrim & The Infinite Sadness (Bryan Lee O'Malley)
2019, male: Scott Pilgrim, Volume 4: Scott Pilgrim Gets It Together (Bryan Lee O'Malley)
2019, male: Season of Storms (The Witcher #0) (Andrzej Sapkowski)
2019, male: So Long, and Thanks for All the Fish (Hitchhiker's Guide to the Galaxy, #4) (Douglas Adams)
2019, male: Sword of Destiny (The Witcher, #0.75) (Andrzej Sapkowski)
2019, male: The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide to the Galaxy, #1) (Douglas Adams)
2019, male: The Last Wish (The Witcher, #0.5) (Andrzej Sapkowski)
2019, male: The Restaurant at the End of the Universe (Hitchhiker's Guide to the Galaxy, #2) (Douglas Adams)
2019, male: The Time of Contempt (The Witcher, #2) (Andrzej Sapkowski)
2019, male: The Tower of the Swallow (The Witcher, #4) (Andrzej Sapkowski)
2019, female: Daring Greatly: How the Courage to Be Vulnerable Transforms the Way We Live, Love, Parent, and Lead (Brené Brown)
2019, female: The Stone Sky (The Broken Earth, #3) (N.K. Jemisin)
2019, male: Watchmen #10: Two Riders Were Approaching (Alan Moore)
2019, male: Watchmen #11: Look upon my works, Ye mighty... (Alan Moore)
2019, male: Watchmen #12: A Stronger Loving World (Alan Moore)
2019, male: Watchmen #1: At Midnight, All The Agents.... (Alan Moore)
2019, male: Watchmen #2: Absent Friends (Alan Moore)
2019, male: Watchmen #3: The Judge Of All The Earth (Alan Moore)
2019, male: Watchmen #4: Watchmaker (Alan Moore)
2019, male: Watchmen #5: Fearful Symmetry (Alan Moore)
2019, male: Watchmen #6: The Abyss Gazes Also (Alan Moore)
2019, male: Watchmen #7: A Brother To Dragons (Alan Moore)
2019, male: Watchmen #8: Old Ghosts (Alan Moore)
2019, male: Watchmen #9: The Darkness of Mere Being (Alan Moore)

Wären die Daten von Goodreads direkt sauber, hätte ich die prozentuale Auswertung und Gruppierung nach Jahr etc. natürlich direkt im Skript gemacht. Stattdessen lässt sich die Auswertung aber auch auf dieser manuell korrigierten Liste als Textdatei einfach im Terminal durchführen:

$ awk -F"[,:]" '{col[$2]++} END {for (i in col) print i, col[i]}' list.txt | sort -k 2
 male 37
 female 6

Fast fertig – der letzte Schritt geht einfach im Kopf mit Alfred: 6 / (37 + 6) * 100 sind aufgerundete 14%. Gar nicht mal so viel. Nehme ich die Comics aus (insgesamt 18 Titel), sind es 24 Prozent. Zähle ich statt einzelner Bücher die Buchreihen jeweils als einen Eintrag, ergibt sich ein Wert von rund 36% (ohne Comics 44%). Immerhin sind es bei Non-Fiction 2/3.


  1. Beim (lesenswerten!) Manifest gegen die Arbeit gar nicht so einfach. Ich hab es aus der Wertung genommen, weil es auch kein Buch im klassischen Sinne ist. ↩︎