Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unparseable index at http://mirror.bjtu.edu.cn/tdf/ #115

Open
poeml opened this issue Jun 5, 2015 · 1 comment
Open

unparseable index at http://mirror.bjtu.edu.cn/tdf/ #115

poeml opened this issue Jun 5, 2015 · 1 comment
Labels

Comments

@poeml
Copy link
Owner

poeml commented Jun 5, 2015

                                                                                [          ]

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue115

Title    unparseable index at http://mirror.bjtu.edu.cn/tdf/
 Priority   bug            Status      in-progress
Superseder               Nosy List     floeff, poeml
Assigned To poeml         Keywords

msg404 (view) Author: floeff Date: 2012-07-20.16:29:28

It seems that http://mirror.bjtu.edu.cn/tdf/ changed their layout, resulting in
the following error message:

mirror.bjtu.edu.cn: unparseable HTML index in

msg491 (view) Author: poeml Date: 2014-01-31.00:02:03

I suggest a workaround:

Use rsync://mirror.bjtu.edu.cn/tdf/ for scanning, which should give better
results and should also be more efficient in general.

msg492 (view) Author: poeml Date: 2014-01-31.00:04:07

There's CSS in front of the HTML which contains
<style type="text/css"> table { border: none; ...

The "table" confuses the scanner, because that's what it looks for.

msg493 (view) Author: poeml Date: 2014-01-31.00:31:06

Or rather, it's a directory listing sent by nginx, and locally styled.

I'm a bit reluctant to adjust the existing parser for nginx directory listings
to this listing. I'm not sure if it's worth the effort, if it's only about this
mirror, for which the scanning can be done via rsync in a much more reliable way.

msg494 (view) Author: poeml Date: 2014-01-31.00:37:38

Sigh, rsync times out, when attempting a scan... there's also FTP, however, the
tdf directory is empty through that channel:

ftp://mirror.bjtu.edu.cn/mirror/tdf/

Too bad.

msg500 (view) Author: floeff Date: 2014-01-31.08:38:00

Your call ;-)
I think you have a valid point with fixing for only one mirror doesn't make
sense, yep, I agree.

History
         Date          User  Action             Args
2014-01-31 08:38:00 floeff set    messages: + msg500
2014-01-31 00:37:38 poeml  set    messages: + msg494
2014-01-31 00:31:06 poeml  set    messages: + msg493
2014-01-31 00:04:07 poeml  set    messages: + msg492
2014-01-31 00:02:03 poeml  set    messages: + msg491
2013-01-31 22:03:03 poeml  set    assignedto: poeml
                                    nosy: + poeml
2013-01-31 22:02:58 poeml  set    status: unread -> in-progress
2012-07-20 16:29:28 floeff create

(end of migrated issue)
@poeml poeml added the bug label Jun 5, 2015
@ideal
Copy link

ideal commented Aug 12, 2015

Does this problem still occur ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants