Add a DownloadResult object to standardize logging of download success, or lack thereof. Make sure all download attempts return such an object, so we log them successfully, and log the original HREF, not the updated-to-a-local-version one. Still have some architectural improvements to make, and added a couple TODO items, but it's a step in the right direction.
Include working links in the DB logging. Also should include the response code (so we can filter based on reason for failed links), file extension if possible to determine. When examining manually, and looking at text logs, it appears we have a fair amount of false-failure results, e.g. a 429 for too many requests, or a -1 somehow in one case for a URL that worked manually.
Validation mode will now check both poorly sanitized and well-sanitized versions of the HTML file name. Unfortunately, I've realized that it's not all that rare for threads to have duplicate names. Which throws off our validation; it sees that the reply count is wrong, and re-downloads the duplicate. And then the original. In at least one case, there's even a third... last one wins. Ugh. Definitely need the slug in the folder name. At least I learned that you can label and break out of a try loop in Java...
Solve one of the remaining thread-hang causes. Set both connect *and* read timeouts when getting HttpURLConnections. I hadn't realized read timeouts were a separate thing, but on URLs, such as http://www.dhl-usa.com/images/truck.gif, you will get them. But it'll wait forever by default. I'm not sure if this will solve all remaining thread-hangs, but it'll fix some of them.
Revamp the timeout/dead thread settings so it will only count things as dead if they haven't made any progress in 120 second (with a warning at 60 seconds). This has two benefits: - Threads that are just slow due to connection timeouts or lots of images won't die. - For threads that are really dead, we'll stop after 2 minutes instead of potentially much, much longer if it's a gigantic thread.