help-debbugs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#43073: Trim/hide full email headers on debbugs


From: Bob Proulx
Subject: bug#43073: Trim/hide full email headers on debbugs
Date: Fri, 4 Dec 2020 19:49:49 -0700

Glenn Morris wrote:
> Note that the "db view tree" is the part that gets indexed by search
> engines. Search engines are (obviously) denied from the cgi bug pages,
> for reasons of system load. So if you get rid of the db pages, it will
> be impossible to search debbugs reports using standard web search
> engines.

I know I asked this in the mailing list but I am going to repeat it
here so it is in the ticket and then add some more.

Where is the seed that the search engines start with to crawl the db
tree?  I couldn't find it.

Meanwhile...  I find this difference between the systems.

    https://debbugs.gnu.org/robots.txt
      User-agent: *
      Disallow: /cgi-bin/
      Disallow: /cgi/

As you say debbugs blocks the robots that comply.

    https://bugs.debian.org/robots.txt
      User-Agent: Googlebot
      User-Agent: bingbot
      User-Agent: yandexbot
      User-Agent: baiduspider
      User-Agent: ia_archiver
      Allow: /cgi-bin/bugreport.cgi?bug=
      Allow: /cgi-bin/pkgreport.cgi?pkg=*;dist=unstable$
      Disallow: /*/
      User-agent: *
      Disallow: /

But the upstream allows the robots to crawl the cgi main bug ticket
display pages.  Maybe they have better resources.  Was this allowed on
debbugs previously and then blocked due to load problems?

I am wondering if we should allow it again as a test and then see what
the current state of things results.  Because then the main pages
would be indexed and this would also avoid the problem.  WDYT?

Bob

Me who keeps making crazy brainstorm suggestions and hoping that maybe
eventually one of them might work out beneficially. :-)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]