HTTrack error on my site

Nounours18200 Posted messages 248 Registration date   Status Member Last intervention   -  
Nounours18200 Posted messages 248 Registration date   Status Member Last intervention   -

Hello,

I am looking to use HTTrack to download the entire content of my WordPress website whose URL is threshold-lovers.com onto my PC.

I should mention that since this site is mine, I could possibly modify a WordPress file that would prevent HTTrack from functioning, if the need arises.

My last attempt resulted in the following error:

 -------------- HTTrack3.49-2+htsswf+htsjava launched on Sun, 20 Aug 2023 21:47:03 at www.threshold-lovers.com +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar (winhttrack -qiC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by HTTrack Website Copier/3.x [XR&CO'2014], %s -->" -%l "fr, en, *" www.threshold-lovers.com -O1 "C:\Mes Sites Web\essai1" +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar ) Information, Warnings and Errors reported for this mirror: note: the hts-log.txt file, and hts-cache folder, may contain sensitive information, such as username/password authentication for websites mirrored in this project do not share these files/folders if you want these information to remain private 21:47:04 Warning: Moved Permanently for www.threshold-lovers.com/robots.txt 21:47:04 Warning: Redirected link is identical because of 'URL Hack' option: www.threshold-lovers.com/robots.txt and <https://www.threshold-lovers.com/robots.txt> 21:47:04 Warning: Warning moved treated for www.threshold-lovers.com/robots.txt (real one is <https://www.threshold-lovers.com/robots.txt>) 21:47:04 Warning: Moved Permanently for www.threshold-lovers.com/ 21:47:04 Warning: Redirected link is identical because of 'URL Hack' option: www.threshold-lovers.com/ and <https://www.threshold-lovers.com/> 21:47:04 Warning: File has moved from www.threshold-lovers.com/ to <https://www.threshold-lovers.com/> 21:47:04 Warning: No data seems to have been transferred during this session! : restoring previous one! -----------------------------------------------

Does anyone have a solution?

Thank you

4 answers

  1. NonoM45 Posted messages 1018 Registration date   Status Member Last intervention   5
     

    Good evening,

    Could this possibly be from your file

     robots.txt 

    See you later

    0
  2. jordane45 Posted messages 30426 Registration date   Status Moderator Last intervention   4 830
     

    Hello,

    The HTTrack software only allows you to retrieve the generated HTML code.. I don't understand the point of using it for a WordPress site...

    Why not back up your site (there are plugins for that.. or else, go through a database DUMP and retrieve the source files via FTP transfer..)

    Anyway.. regarding your blockage, it could come from a .htaccess file


    .
    Kind regards,
    Jordane

    0
  3. Nounours18200 Posted messages 248 Registration date   Status Member Last intervention   10
     

    Wouldn't that come from your robots.txt file?

    Yes, maybe, but I have no idea how to modify it??

    Why not back up your site (there are plugins for that... or alternatively, by doing a database dump and retrieving the source files via FTP transfer...)

    I do that too, indeed, but we have difficulties getting it to run locally on PC with WAMP or XAMPP, which is why I'm trying to scrape it with HTTrack.

    Since the site belongs to us, we can modify (at least temporarily) any file that would prevent this scraping, but we need to know which ones, or what needs to be done???

    0
    1. jee pee Posted messages 9437 Registration date   Status Moderator Last intervention   9 973
       

      Hello,

      As @jordane45 indicated, httrack is not able to retrieve the PHP sources of the site, nor the database. It could at most retrieve a static version in .html, which is just the version of the pages at a given moment.

      By having FTP access to the site, you can copy all the sources of the site. However, these can only function with the database. Therefore, a backup of the database is needed. Alternatively, use WP functions/plugins for transferring the site.

      0
  4. Nounours18200 Posted messages 248 Registration date   Status Member Last intervention   10
     

    I have since found an alternative to HTTrack called "Cyotek" that perfectly mirrored my site, including images and links, just like HTTrack is supposed to do.

    Since HTTrack has always made good local copies of websites for me, I thought it should be capable of that?

    I understand what you're saying about backups (like databases, for example), and I've been doing that for years, but if Cyotek perfectly copied my site, HTTrack should be able to do the same...

    0