HTTrack error on my site
Nounours18200 Posted messages 248 Registration date Status Member Last intervention -
Hello,
I am looking to use HTTrack to download the entire content of my WordPress website whose URL is threshold-lovers.com onto my PC.
I should mention that since this site is mine, I could possibly modify a WordPress file that would prevent HTTrack from functioning, if the need arises.
My last attempt resulted in the following error:
-------------- HTTrack3.49-2+htsswf+htsjava launched on Sun, 20 Aug 2023 21:47:03 at www.threshold-lovers.com +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar (winhttrack -qiC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by HTTrack Website Copier/3.x [XR&CO'2014], %s -->" -%l "fr, en, *" www.threshold-lovers.com -O1 "C:\Mes Sites Web\essai1" +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar ) Information, Warnings and Errors reported for this mirror: note: the hts-log.txt file, and hts-cache folder, may contain sensitive information, such as username/password authentication for websites mirrored in this project do not share these files/folders if you want these information to remain private 21:47:04 Warning: Moved Permanently for www.threshold-lovers.com/robots.txt 21:47:04 Warning: Redirected link is identical because of 'URL Hack' option: www.threshold-lovers.com/robots.txt and <https://www.threshold-lovers.com/robots.txt> 21:47:04 Warning: Warning moved treated for www.threshold-lovers.com/robots.txt (real one is <https://www.threshold-lovers.com/robots.txt>) 21:47:04 Warning: Moved Permanently for www.threshold-lovers.com/ 21:47:04 Warning: Redirected link is identical because of 'URL Hack' option: www.threshold-lovers.com/ and <https://www.threshold-lovers.com/> 21:47:04 Warning: File has moved from www.threshold-lovers.com/ to <https://www.threshold-lovers.com/> 21:47:04 Warning: No data seems to have been transferred during this session! : restoring previous one! ----------------------------------------------- Does anyone have a solution?
Thank you
4 answers
-
Good evening,
Could this possibly be from your file
robots.txtSee you later
-
Hello,
The HTTrack software only allows you to retrieve the generated HTML code.. I don't understand the point of using it for a WordPress site...
Why not back up your site (there are plugins for that.. or else, go through a database DUMP and retrieve the source files via FTP transfer..)
Anyway.. regarding your blockage, it could come from a .htaccess file
.
Kind regards,
Jordane -
Wouldn't that come from your robots.txt file?
Yes, maybe, but I have no idea how to modify it??
Why not back up your site (there are plugins for that... or alternatively, by doing a database dump and retrieving the source files via FTP transfer...)
I do that too, indeed, but we have difficulties getting it to run locally on PC with WAMP or XAMPP, which is why I'm trying to scrape it with HTTrack.
Since the site belongs to us, we can modify (at least temporarily) any file that would prevent this scraping, but we need to know which ones, or what needs to be done???
-
Hello,
As @jordane45 indicated, httrack is not able to retrieve the PHP sources of the site, nor the database. It could at most retrieve a static version in .html, which is just the version of the pages at a given moment.
By having FTP access to the site, you can copy all the sources of the site. However, these can only function with the database. Therefore, a backup of the database is needed. Alternatively, use WP functions/plugins for transferring the site.
-
-
I have since found an alternative to HTTrack called "Cyotek" that perfectly mirrored my site, including images and links, just like HTTrack is supposed to do.
Since HTTrack has always made good local copies of websites for me, I thought it should be capable of that?
I understand what you're saying about backups (like databases, for example), and I've been doing that for years, but if Cyotek perfectly copied my site, HTTrack should be able to do the same...