What's wrong with it is that they didn't define the two watches very clearly. It's obvious to assume they meant identical watches (my watch is more massive than my wife's, ticking or not) but they don't specify the conditions of the watches. Really though, we only need one watch and an arbitrarily accurate scale to show what's going on.
Let's say our watch starts with a fully charged but disconnected battery. At this point it is not ticking. We weigh the watch to determine its non-ticking mass. Then we connect the battery and weigh the watch while it's ticking. Aside from small vibrations due to the clockwork, the scale will not read a higher weight/mass; in fact, the reading will decline slowly as the watch runs, until the battery "dies" and no more charge is flowing. The watch is once again not ticking, but now it is less massive than before.
The reason the mass does not increase when the battery is connected is because the kinetic energy of the clockwork, which must be included in a full inventory of the mass-energy of the system, is generated from the chemical potential energy stored in the full battery, which must also be included. When the battery runs the clockwork, that kinetic energy is then converted to heat, which is lost to the environment, reducing the overall mass-energy of the watch.
Basically what it comes down to is that a charged battery is more massive than a depleted identical battery due to the stored energy in the chemical potential, so really the main mistake was unclear writing: technically they did mention the "intrinsic potential energy" as a factor, though that only implies a charged battery without being explicit about why a charged vs. uncharged battery matters.