(Post is too long for email due to many screenshots; For those reading via email, click on “View Entire Message”)
Goldman Sachs Research estimates that 300 million jobs will replaced by artificial intelligence tools such as large language models. But can current LLM’s such as Chat GPT (hereon CGPT) replace vintage watch dealers and other participants in the vintage watch (or even non-watch) market? This is a multi-part series on this topic.
Part 1: Can Chat GPT Become a Vintage Watch Dealer?
Now that CGPT is multimodal and image-enabled, can it replace vintage watch dealers? Firstly, we must define the tasks of a human vintage watch dealer:
Developing relationships with other dealers, clients, vendors and other participants in the vintage watch market
Identifying, authenticating and pricing vintage watches on offer and to sell
Repair and/or refurbish
Photographing and listing inventory on various platforms
Physically attending tradeshows and other venues
Marketing (themselves)
Setting Trends (hyping watches)
Retail shop floor management, including returns
Inventory management
Logistics, including packaging and shipping
Bookkeeping and funds management
Security and insurance management
License and permit management
Many of these tasks require a physical component which would require a robotic transformer powering a physical robot. So the quick answer to the question “Can CGPT (or any other AI-powered application or LLM) replace a vintage watch dealer?” is no, at least not in 2023. This is the same for many other professions, where AI such as CGPT is better suited as an assistant, rather than outright replacement.
AI’s most valuable attribute is its ability to synthesize vast amounts of data. Hence in the vintage watch world, data and knowledge based tasks such as identifying, authenticating and pricing watches would probably be the first high-value tasks to be replaced by AI. (Using CGPT to quickly write watch descriptions and marketing materials is another one, but its not a very high value task so skipping it in this post.) Can CGPT, a large language model and currently AI’s most popular application, perform these tasks well? Let’s find out.
Below are screenshots of my conversation with CGPT 4.0 via the iPhone app regarding a correct Rolex GMT Master 1675 Mk 3. For this post we will focus on identification, basic authentication and pricing. Complicated authentication in Part 2.
Identification & Authentication
We first start out with identification. Was CGPT able to identify the watch?
Kind of. CGPT recognized the watch as a Rolex GMT Master, but was only able to guide me to find the reference number. It also makes sure to include subtle disclaimers by urging me to “consult with a watch specialist or expert.”
The 1675’s were first manufactured with glossy dials and later transitioned to matte dials. There are several versions of both the glossy and matte dials, usually referenced as Mark 1, 2, etc. The watch in question is a Mark 3 matte dial, also referred to as the “Radial” dial due to the hour indexes being closer to the center of the dial. Could CGPT “see” the picture and identify it? The Radial dials were manufactured for 1675s in the 41xxxxx – 54xxxxx case serial range. Given a serial in this range (but not the actual serial of this watch) could it judge whether it was correct for this dial?
CGPT correctly identifies the dial as matte due to the “lack of sheen”, which is an excellent start. It is also able to narrow down the dial to a Mark 2 or 3, then selects the correct answer when nudged. Unfortunately, the reasoning is wrong and confused with other dials. For example the Mark 2 is not known for a elongated E - that would be the Mark 1. I also don’t understand its obsession with open 6’s, since there are none on the dial. When I mention the hour plots, it somehow lands on the correct answer but again for the wrong reason - it should tell me that the lume plots are positioned closer to the center of the dial, but instead it has a strange obsession with the word “elongated” and claims that the “elongated plots” make it a Mark 3. It understands, however, that the serial is within the correct Mark 3 range.
And how about the bracelet and bezel? Are they are correct for this reference?
As in the identification of dial version above, when asked to analyze a photo, CGPT somehow seems to land on the correct answer, but for some wrong reasons. It mentions open 6’s as one of the reasons for the bezel being correct, but there are no 6’s let alone open 6’s on the bezel. CGPT also mentions “specific numerical style” as a reason for the bezel being correct, but this is a vague answer that denies us a window into its method of assessment. It does a better job authenticating the bracelet, and this reflects its superior text information synthesizing ability.
How about the controversial question of case polishing? This example has a near mint condition case with sharp bevels that appears unpolished.
Pretty good assessment! It is a mystery why ChatGPT is able to assess case condition more accurately than it is able to determine dial version, given that polishing is usually more difficult to assess.
Due to the length limitation on Substack posts I’m not going to attach the screenshots, but I also queried CGPT to identify and authenticate the movement and inner caseback, which it did correctly. It was also able to read the “1675” engraving on the caseback and confirmed it correct for this serial range.
Pricing
Pricing is complicated and varies based on different parameters, including location, dealer to collector vs dealer to dealer, etc. (Unfortunately due to post length I’m only able to attach one more screenshot). How did ChatGPT do?
A quick search on Chrono24 for Radial 1675’s without accessories in similar condition reveals an average retail price of approximately $30K, so CGPT’s price range is 10-20% higher. When queried about dealer-to-dealer pricing for this watch, ChatGPT replied with $28-$32K, so it thinks that dealers get a 14-20% margin. Overall not a bad result. I used the word “pristine” when describing the condition, so this may have had an effect. Also note the training cutoff for this version of CGPT was listed as Jan 2022.
Verdict
So can CGPT be a vintage watch dealer that can identify, authenticate and price? Based on the above results the quick answer is most probably “well kind of, but I’m not trusting it”. However in order to properly evaluate its performance we need to dig a little deeper and compare it to how we traditionally identify, authenticate and price.
The Traditional Way
When encountering an unfamiliar reference (or a familiar reference with an unusual characteristic) most dealers will simply Google to identify what watch it is. For most watches this 5-10 minute search process will reveal most basic information such as market price and manufacturing era. The dealer will also seek multiple instances of the watch in question from various different credible sources (watch magazines, other resellers, auctions, etc) to gain confidence in its authenticity and liquidity in the market. This is especially important when there is an unusual characteristic such as distinctive dial script, case material, etc.
Finally, a dealer can also Google articles or tutorials on how to identify or authenticate a certain reference or model. For example, where should one look for the serial number? How to know if this dial is correct for this production year?
Google is excellent at providing the WHAT and HOW. What it cannot provide is DO. By DO, I mean synthesize the what and how information to actually perform the identification, authentication and pricing tasks on the dealer’s behalf.
Fuzzy Processor, Not an Oracle
DO is what we hoped (and are still hoping) CGPT could do. LLM’s should have the ability to synthesize all the publicly available information about watches and become a Vintage Watch Oracle.
Although CGPT somehow landed on the right answers in our exercise above, its reasoning was sometimes incorrect, putting into doubt its ability to perform the 3 tasks we gave it. In fact when I queried CGPT again to identify the same watch using different prompts, I got different answers, in one instance telling me that the dial was a Mark 2. What causes these inaccuracies?
Well firstly CGPT and LLM’s in general are not deterministic; they are fuzzy processors that are predicting what text should come next in its conversation with you. This is probably further complicated by limitations to its recently introduced image comprehension abilities, for which there isn’t much public testing on yet relative to text. Based on its responses, it seems that CGPT is able to “see” some things (sheen on the dial, sharp edges, etc.) and either “not see” or outright make-up others (open 6’s, elongated hour markers).
However also note that it did very well on serial range and bracelet code authentication. This is probably because both tasks only entail CGPT recalling a simple 2 dimensional text lookup table, which is easier for an LLM to correctly recall/predict when prompted.
Teacher Learning from the Pupil
If CGPT is not accurate or outright lies, what is it good for? It is definitely not a tool whereby I input pictures of watches and it spits out the information I need. That kind of app will need more advances in computer vision, and a database of standardized photos that is well curated, annotated and data engineered, not to mention a vast repository of historical price data - a huge and unprofitable undertaking.
CGPT excels at helping the watch dealer learn to identify, authenticate and price via a conversation. Its like a teacher (the watch dealer) learning from its pupil (CGPT); the teacher and pupil have a conversation, pupil provides naive answers, and the teacher learns as he/she guides the pupil to the correct answer. In fact I was using CGPT incorrectly in this exercise - it would have been better to ask CGPT what are the proper steps necessary to identify/authenticate/price a Rolex GMT Master, and learn from guiding it in what I slowly sense is the right direction. This type of interaction would especially benefit a new collector, and help reduce information asymmetry in the vintage watch market.
Still, I’m optimistic that in the future, CGPT or its derivatives will be able to more accurately answer identification/authentication/pricing questions, especially if its ability to “see” pictures improves. Training on better data and more conversations with watch dealers would also improve CGPT’s accuracy. One day we may see dealers at tradeshows, leaning over display tables, watch in one hand and conversing with a voice activated VintageWatchCGPT in the other!
This was fascinating to read... what’s also worth noting it perhaps the potential of your queries being informative to CGPT’s training ... if you spent many hours doing this with lots of different models and references, it may well become even better? What a time to be alive!