5 Comments
Sep 19Liked by George H.

My experience applying large language models is similar to yours: the models exhibit poor accuracy out-of-the-box on watch-related tasks. But they perform _incredibly_ well at data analysis tasks, so I'm not surprised to see Sonnet 3.5 complete the task easily given some expert-curated listings.

Going a step further: while most LLMs cannot access the web, Perplexity can. Unfortunately the initial results aren't amazing, but web scraping is a solved problem (either with software or AI): https://www.perplexity.ai/search/please-pull-prices-for-cartier-1B.T1OMnSwe5u5xCZSzfJA

So there's certainly something here. It would be a short jump to incorporate forum listings via WatchCharts API access (https://watchcharts.com/api). One could even access the Chrono24 API *directly*, although I don't want to link those projects and risk taking them down.

Expand full comment
author

By the way, even Claude, which in my experience is more accurate for tasks I give it, makes mistakes - it didn't include 2 listings it was supposed to in the initial query. (That is why there is a different mean number in the first screenshot - it was the screenshot before I asked it to correct and include those 2 listings.) But over time I'm confident that LLMs will become more accurate.

Expand full comment
author

Thanks A., I had a lot of trouble with perplexity.ai (and ChatGPT too by the way). It kept "pulling" non-existent listings so I gave up and did the download part on my own and fed it to Claude, which in my experience makes fewer mistakes. The key would be to build something where people comment and the comments correspond to a specific listing. I guess you could build a standalone Reddit style commenting site with Chrono24 API but I'm not sure how robust it would be.

Expand full comment

Hi George, I left you a message on IG about this post. Mentioning it here in case you haven’t yet seen it.

Expand full comment
author

Thanks just saw your message and replied 👍🏻

Expand full comment