The following is a review of the book Who Makes the NBA. Of particular relevance to this Substack newsletter is the author’s extensive use of ChatGPT for data analysis. A brief interview with the author follows the review.
The world of book publishing is a fairly staid and conservative one. For many years we’ve been told that self-publishing, whether through Amazon or another service, would disrupt publishing, and finally we’d be done with publishers, editors, etc. And yet, they’ve persisted. Well, I just read a new book, Who Makes the NBA, by Seth Stephens-Davidowitz, who is a data scientist and economist. Seth reached out to me on Twitter, after I responded to one of his tweets, and he asked me if I was interested in reviewing the book. When I said yes, he sent me a free copy of the book. In exchange for the free copy of the book, I let him see this review before I published it. I’ve received no compensation from him for this review.
What makes this book especially interesting isn’t that it is self-published, though it is. No, what makes this book interesting is ChatGPT. Before you turn away in disgust, bear with me. The author tells me that he didn’t use ChatGPT to write the text. Rather, he used its analytical capabilities for the book’s quantitative analysis. Stephens-Davidowitz’ book stands out not only for its insightful exploration of basketball, but also for its use of AI in accelerating research and data analysis. His use of ChatGPT’s Data Analysis tool (formerly called Code Interpreter) presents us with profound implications for the future of quantitatively-oriented books.
The book begins by exploring the history of basketball, invoking the legend of James Naismith and his peach baskets. This invention set the stage for the sport that dominated the ‘80s and ‘90s. Through meticulous analysis, Stephens-Davidowitz uncovered an interesting correlation between a player’s height and their likelihood of making it to the NBA. For instance, the probability of a man over 7 feet tall reaching the NBA is around 1 in 7.2, while somone who is under 5’10” has only a 1 in 3.8 million chance. This kind of analysis deftly illustrates the role that height plays in basketball. Of course, you might be thinking, everyone knows that height matters in basketball! Yes, of course, everyone knows that, but the point here is that its quantification provides us with robuts information about how much it matters.
Further analysis notes that extremely tall NBA players are not necessarily the best athletes. For every Victor Wembanyama, there is a Manut Bol or Shawn Bradley. The tallest players’ jumping, speed, and agility metrics don’t match those of shorter players. This suggests that, up to a point, height provides a unique advantage that can compensate for other deficiencies. This revelation invites a reevaluation of how athletic performance is measured and valued in professional basketball.
One of the book’s most innovative contributions is the introduction of the MUGGSIES metric1 (Metric for Understanding Game Given Sporting Individual’s Effectiveness & Size), a tool developed to evaluate players on a height-adjusted basis. This analysis leads to a reshuffling of the hierarchy of basketball greatness, challenging traditional perceptions and highlighting the underappreciated skills of shorter players. It’s a powerful testament to the author’s ability to leverage AI for novel insights, providing a fresh perspective on a well-trodden subject.
Stephens-Davidowitz also explores the globalization of basketball, noting the increasing diversity in the NBA, with rising numbers of foreign-born players. This section not only chronicles the sport’s international expansion but also underscores the uneven distribution of opportunities globally. The lack of Indian and Chinese NBA players (aside, obviously, from Yao Ming) raises questions about talent cultivation and access to sports infrastructure.
In perhaps the most provocative part of the book, the author examines the role of genetics in basketball talent. Through an analysis of identical twins in the NBA, he posits that basketball skill is highly genetically determined, more so than in many other sports. This argument adds a new dimension to the nature vs nurture debate in sports talent development.
Lastly, the book touches on the socio-economic backgrounds of NBA players, using unique names (Lebron James, for example) as a proxy for class and educational background. This analysis reveals that NBA players are more likely to come from middle-class or affluent backgrounds (Michael Jordan’s parents, for example, worked at a bank). This challenges the stereotype of basketball as a path ot ouf poverty. (As an aside, the documentary Hoop Dreams is a must-watch on this topic.)
Who Makes the NBA is more than a book about basketball. It’s a tesament to the power of AI in reshaping research and storytelling. Stephens-Davidowitz’s work not only provides groundbreaking insights into basketball but also serves as a blueprint for how AI can be harnessed to deepen our understanding of any field. The lessons drawn from this book are applicable far beyond the hardwood, offering a glimpse into a future where AI and human intelligence collaborate to uncover truths hidden in plain sight.
Some questions for the author
You mentioned the high likelihood of 7-footers making it to the NBA. How does this statistic influence the scouting and training of young basketball players?
Roughly 1 in 7 7-footers reaches the NBA. One implication of this is that, if you are 7 feet tall, your potential in basketball is obvious. 7-footers from around the world are spotted and trained. If you have basketball talent but are shorter, your potential to reach the NBA is much lower. While more than 50 percent of 7-footers come from outside the United States, fewer than 10 percent of the shortest NBA players come from outside the United States.
Regarding your chapter on the genetic advantages in basketball, you do you balance the role of natural talent versus training and skill development?
The way to disentangle genetics versus other advantages is to look at twins. If something is highly genetic, identical twins will be much more similar than other same-sex siblings. It turns out that basketball is far more dominated by identical twins than other sports.
In your study, you observed a relationship between socioeconomics and NBA success. What implications does this have for youth sports programs and talent scouting?
Coming from a more privileged background—a wealthier zip code, a two-parent home, etc.—is a strong positive predictor of reaching the NBA. One of the reasons for this seems to be developing non-cognitive skills, such as trust and discipline. Even among NBA players, those from more underprivileged backgounds commit technical fouls at a higher rate. This is one more piece of evidence of how much talent we lose due to poverty in the United States.
When using AI for data analysis and content creation, how did you decide which tasks to delegate to AI versus handling personally?
The biggest limitations right now for AI data analysis are memory limitations. If you need to perform a computationally intensive analysis, you can’t do it in ChatGPT. Pretty much everything else, AI can do. That said, there are some things that you have to double and triple check. The more you use AI for data analysis, the more you know when you can rely on the analysis and when you have to double and triple check the analysis. For example, every time I merge a dataset, I go over the code and examine the merged dataset manually to make sure that AI didn’t screw anything up.
Could you discuss a specific instance where AI provided an unexpected insight or perspective during your research for the book?
I needed an objective measure of how difficult the childhood backgrounds of NBA players were. No such objective measure existed. But I realized ChatGPT knows a ton about the background of every player. I asked it to rank the difficulty, 1 to 10, and it gave very sensible rankings. In some sense, this is the same thing researchers might pay people on Mechanical Turk to do—to gather information and try to objectively code it. But ChatGPT does it thousands of times quicker.
In terms of writing efficiency, how did AI help in organizing and structuring the vast amount of data and information you collected?
If writing has to be great, you can’t rely on LLMs. Very little of the book was written with any assistance from ChatGPT. I used ChatGPT to do the data analysis and make the art. And then I wrote the book myself. That said, there was an appendix where I describe how I used AI. I didn’t feel the writing for this section had to be great (it was an appendix, after all). So I wrote a very rough outline with the points I wanted to make and told ChatGPT to turn it into prose.
You mentioned the challenge of ensuring the truthfulness of AI-generated content. What strategies did you employ to verify and fact-check the AI’s output?
When I used AI for data analysis, ChatGPT shows you all the code that it is running. So you can see and make sure that it is doing everything it is supposed to. For data analysis, the AI doesn’t really hallucinate. It does, however, make the same silly errors that any coder may make. For example, when merging a dataset where there are many players with the same name, it may do the merge incorrectly. The way to deal with this is the same way you would deal with your own code: check the newly merged dataset to make sure the code has done what you thought it was going to do.
Finally, what limitations did you encounter while using AI in your creative process, and how did you overcome them?
There were many limitations. For example, sometimes even the best piece of art I could create had a person with 6 fingers in it. However, I was generally blown away by how much AI could do. Time and time again, I was unsure whether AI could help. And just about every time I was uncertain, the answer was yes.
Named, obviously, for the diminutive Muggsy Bogues.