Your Self-Assurance Annoys Me

Since finishing up The Shining diorama I’ve thrown myself into a new project, this one involving AI. It’s a stupid project—which is why I love it—promising neither revolutions nor solutions, just grist for the insatiable tech mill fueled by capital. I’ll post more about the project shortly, but for now I just wanted to document some of my early experiences with ChatGPT.

Game stats for the great Curtis Martin from Madden 2001 that ChatgPT transcribed no problem

I’ve been using AI to read screenshots of stats and game results from Madden 2001 (a 25 year old video game) and then have it structure the data so it could be easily moved to a spreadsheet. That spreadsheet will then be the fodder for blog posts, radio shows, highlight reels, etc. I was really impressed with how accurate the OCR readings of the screenshots I uploaded a week or so ago were, not to mention how quick and easy it was to structure the data from the image into tables.

Getting excited, the other day I tried importing a 16 second video to see if it could identify and transcode similar data. ChatGPT crashed hard. Knowing video was a long shot, I followed the bot’s advice and broke the video down into about 64 screenshots featuring win-loss data of 31 NFL teams over 16 weeks, the results were a shit show. For context, I wanted get a table containing the following information:*

Overall Wins and Losses for the season (should be easy given it’s already on the screenshots)
Divisional Wins and Losses (record of games within the same division)
Conference Wins and Losses (record for each team against another AFC team)
Also, tracking results of head-to-head games between any of the following teams: Jets, Dolphins, Patriots, Ravens, Titans, Steelers, Raiders

Here was my prompt:

I have screenshots of 16 weeks of games played on Madden 2001 and I need to know the overall records, divisional records, conference records for all AFC teams. Additionally, the outcomes of any head-to-head games between the following teams: the Jets, Broncos, Raiders, Dolphins, Patriots, Ravens, Titans, and Steelers.

I initially thought everything was going fine. I was feeding it screenshots from the first 10 weeks of the season and it was chugging along after a few missteps and re-directions. Believing I had adequately finessed my prompt parameters and avoided what would surely be an arduous manual transcription project, I was ready to join the AI cult and sell my children for more GPU credits.

1 of 64 screenshots uploaded to ChatGPT for transcription

Unfortunately I mistakenly pressed the stop button thinking I could upload my next set of images, and all the data it was amassing was lost. That’s on me—and who knows, it could have been perfect. I asked ChatGPT to try and recover that data and continue the process, and it could not do that successfully. So I asked it to try rebuilding the data based on the points above, and I got the following when I checked in at week 11:

Stats of the 2000-01 season through week 11, notice anything glaring?

Turns out it was registering widely divergent data on games played that was just flat wrong. I asked how it could only have 4 games overall for a NFL team in week 11, and it said, happily, “You’re right, my mistake, let me fix that.” This went on for a number of back-and-forths, and then I just started from scratch because everything was a mess. When I re-did the process, I was checking week-by-week and realized all the overall win-loss record data was still off, so the conference and divisional records were totally unreliable. I tried providing updated records for certain teams, but within a week or two they were off again.

In short, it quickly became too much given I had to constantly check the data and what was supposed to save time was becoming just as laborious. No biggie, I’m sure my prompt had its limits and I may be using this tool for things it doesn’t do well, but the thing that annoyed me was how cocksure it was of any and all of its results. Everything was fine until I questioned it, and then it was like “you’re right, that’s a problem, we should have, blah blah blah” like some sycophantic poser who doesn’t want to be found out for not really understanding anything. It was even making simple factual mistakes like saying the Miami Dolphins played the Pittsburgh Steelers in the 2000-01 regular season (even making up a fake score):

Basic factual errors were rampant when asking ChatGPT about the 2000-01 NFL schedule. Turns out I was right to question everything 😉

Anyway, I transcribed all the data manually into a spreadsheet given I was so skeptical after two hours of this that I couldn’t really trust any involved data spanning multiple weeks without checking everything, and by that point it was easier to do it myself. I mean the potential line-up for the 2000-01 NFL playoffs were in the balance, so the stakes were high!

Alas, I spent several hours manually going through each game for each week figuring out every squad’s division and conference records, as well as checking on any head-to-head games that might impact the playoff picture. Why didn’t you include this data Madden 2001!

Asking AI for a breakdown of how Common Games are handled if two teams have same overall, divisional or conference schedule and the decision comes down to common games to get a playoff berth.

I did appreciate the breakdown ChatGPT provided of the NFL rules that determine how two teams that are tied for overall, divisional, and conference wins and losses calculate common game win percentage (that is on the table for two teams in the NFC this season). But whether or not I can rely on its reading of the rules is a definite question at this point.

After playing like this I can easily see how seductive AI is as a tool you simply ask questions and it breaks things down for you neatly, sparing you any messy web search. That said, after this experience I have a hard time justifying using it for data as important as the results of a simulated 25 year old NFL season randomly generated by an emulated Playstation game.

Here’s another question I asked while trying to figure out if the Steelers have a better conference win-loss percentage than the Dolphins. Trying to confirm the Steelers played 13 conference games, not 12, I asked ChatGPT:

How many AFC conference games did the Steelers play in the 2000-01 season?

Really? I double-checked each game through 16 weeks and I was sure it was 13 games, so I prompted it to check again:

After questioning the certainty of its answer about the Steelers playing 12 conference games in 2000-01, they finally caved!

Sometimes it’s good to be right! Apart from that, it’s interesting how being wrong is never really an issue. The ChatGPT bot takes on a chipper, annoying persona filled with a sense of “self”-assurance that belies just how wrong it often is. I would much prefer that it grovel at my feet, or offer me bonus credits for wasting my time. Even better, apologize profusely while looking down at its imagined feet. And for a total coup d’état, I would love if it started mulling whether or not it should be doing any of this at all, or even taking pot shots at me for doing something so stupid as trying to track data from a 25 year old imagined football season no one, save me, will even care about. I’m not mad it’s wrong, I’m mad how wrong it is.

______________________________

*Madden 2001 does not track conference or divisional records, only overall records. With an extremely close season in terms of overall records, I needed more details to get a sense of the playoff picture as we head into week 17.

2 Responses to Your Self-Assurance Annoys Me

Eric Likness says:

June 19, 2025 at 10:39 am

They should rename it Eddie Haskell GPT. ‘Cuz that delivery man, vs. the results. It’s just that ingenious, “Hello Mr. Cleaver, Mrs. Cleaver, I was just talking with Wallace, your son here, and we’ve both come the conclusion that he and I should be allowed to drive YOUR car after midnight once the dance lets out this Saturday”.

It makes one wonder what “crap” ChatGPT is tell it’s little LLMs in the model, behind our backs (just like Eddie Haskell), when we’re not looking. Probably something like a Mark Zuckerberg style “Dumb F***s”

Alan Levine says:

June 19, 2025 at 2:54 pm

Will there be a future diorama where you bury Altman’s head in the sand and force him to watch the inane brute force screen beating it takes to use his product?

I have to admit some scrolling through the details but must eagerly applaud and honest recant of encounter with ChadGPT. I remember an AI Envangelist ed tech blogger who boasted in a post, “A task that took me two weeks before AI was done in 1 hour”. No detail, no even mention of what the task was. Or, the ones who just share the final product without even being honest how much time and re-prompting and shaking the box it takes just to get a candy bar from the vending machine.

The whole mantra of “it saves time” is the biggest snake oil pile of BS, there is no real method of measuring time saved in cognitive tasks.

But I digress.

The whole problem in using GenAI is expecting it to be “right” or “factual”. I use it where it excels- in making **** up or where its nearly impossible to be wrong. It’s great for testing web forms where I need BS text, more fun than lorem ipso.

Yeah, it has worked a few times to sort free form text, or to do a complex conversion of units.

I respect what Tom does and has been sharing on doing. That’s Woodward doing it the way he always does things.

Heres the thing- I just do not want to build, work, create in this style. It offends my soul.

I am not against the tech, its the humans hawking it.

Your Self-Assurance Annoys Me

2 Responses to Your Self-Assurance Annoys Me

Leave a Reply Cancel reply

Watch the bava blog trailer!

about

I am Jim Groom

Recent comments

Recent Posts

browse the bavarchive

Contributors

some favorites