Title makehashes: Regular expression not quoted
Priority bug Status resolved
Superseder Nosy List poeml, sascha_silbe, toma
Assigned To poeml Keywords

Created on 2012-01-09.22:36:13 by sascha_silbe, last changed by poeml.

msg345 (view) Author: sascha_silbe Date: 2012-01-09.22:36:12
When running "mb -d makehashes /srv/upload -t /srv/mirrorbrain/hashes/srv/upload" on the server hosting, MirrorBrain breaks with the following error:

 2/QueryAll:  SELECT filearr.path, hash.file_id
                   FROM filearr
               LEFT JOIN hash
                   ON hash.file_id =
               WHERE filearr.path ~ '^services/gcc-c++/[^/]*$'
 2/QueryR  :  SELECT filearr.path, hash.file_id
                   FROM filearr
               LEFT JOIN hash
                   ON hash.file_id =
               WHERE filearr.path ~ '^services/gcc-c++/[^/]*$'
 2/COMMIT  :  auto
Traceback (most recent call last):
  File "/usr/bin/mb", line 1638, in <module>
    r = mirrordoctor.main()
  File "/usr/lib/pymodules/python2.6/", line 257, in main
    return self.cmd(args)
  File "/usr/lib/pymodules/python2.6/", line 280, in cmd
    retval = self.onecmd(argv)
  File "/usr/lib/pymodules/python2.6/", line 412, in onecmd
    return self._dispatch_cmd(handler, argv)
  File "/usr/lib/pymodules/python2.6/", line 1100, in _dispatch_cmd
    return handler(argv[0], opts, *args)
  File "/usr/bin/mb", line 1024, in do_makehashes
    for i, j in mb.files.dir_filelist(self.conn, dst_dir_db)]
  File "/usr/lib/pymodules/python2.6/mb/", line 160, in dir_filelist
    result = conn.Server._connection.queryAll(query)
  File "/usr/lib/python2.6/dist-packages/sqlobject/", line 356, in queryAll
    return self._runWithConnection(self._queryAll, s)
  File "/usr/lib/python2.6/dist-packages/sqlobject/", line 256, in _runWithConnection
    val = meth(conn, *args)
  File "/usr/lib/python2.6/dist-packages/sqlobject/", line 349, in _queryAll
    self._executeRetry(conn, c, s)
  File "/usr/lib/python2.6/dist-packages/sqlobject/", line 335, in _executeRetry
    return cursor.execute(query)
psycopg2.DataError: invalid regular expression: quantifier operand invalid

services/gcc-c++ is the name of a directory below /srv/upload:

silbe@sunjammer:~$ ls -d /srv/upload/services/gcc-c++

MirrorBrain should escape special (regular expression) characters in paths before using them as part of a regular expression.

Additional info:
The host is running MirrorBrain 2.15.0-1 on Ubuntu 10.04:

silbe@sunjammer:~$ lsb_release -ir
Distributor ID: Ubuntu
Release:        10.04
silbe@sunjammer:~$ dpkg -l mirrorbrain|grep ^ii
ii  mirrorbrain                       2.15.0-1                                        MirrorBrain is a scalable download redirector and Metalink generator.
msg358 (view) Author: toma Date: 2012-03-25.10:50:45
Bug confirmed. KDE runs into this one as well.
msg364 (view) Author: poeml Date: 2012-03-26.22:31:22
It seems to me that the only way to deal with this is to manually escape regexp 
special characters in the path names.

 select 'services/gcc-c++/a' ~ '***:^services/gcc-c\+\+/[^/]*$' as result;

There is no PostgreSQL function to do this, and there doesn't seem to be a way to 
embed a literal string inside a regular expression.
msg377 (view) Author: poeml Date: 2012-04-11.21:22:20
So the task is to pass a regexp from Python to PostgreSQL, through SQLobject and psycopg2, that 
contains some characters that need to be treated as literals.

So, let's pass all literal characters as literal characters! I.e., using octal \000 syntax.

Fixed in r8271.

Index: ../mb/mb/
--- ../mb/mb/   (revision 8270)
+++ ../mb/mb/   (revision 8271)
@@ -1,5 +1,6 @@
 from sqlobject.sqlbuilder import AND
+from mb import util
 def has_file(conn, path, mirror_id):
     """check if file 'path' exists on mirror 'mirror_id'
@@ -156,7 +157,7 @@
                    FROM filearr 
                LEFT JOIN hash 
                    ON hash.file_id = 
-               WHERE filearr.path ~ '^%s/[^/]*$'""" % path
+               WHERE filearr.path ~ '^""" + util.pgsql_regexp_esc(path) +"""/[^/]*$'""" 
     result = conn.Server._connection.queryAll(query)
     return result
Index: ../mb/mb/
--- ../mb/mb/    (revision 8270)
+++ ../mb/mb/    (revision 8271)
@@ -210,3 +210,9 @@
         netloc = netloc.split('@')[1]
     return urlparse.urlunsplit((u[0], netloc, u[2], u[3], u[4]))
+def pgsql_regexp_esc(s):
+    if s:
+        return '\\\\' + '\\\\'.join(['%03o' % ord(c) for c in s])
+    else:
+        return s
Date User Action Args
2012-04-11 21:22:20poemlsetstatus: chatting -> resolved
messages: + msg377
2012-03-26 22:31:22poemlsetmessages: + msg364
2012-03-26 21:47:00poemlsetassignedto: poeml
nosy: + poeml
2012-03-25 10:50:45tomasetstatus: unread -> chatting
nosy: + toma
messages: + msg358
2012-01-09 22:36:14sascha_silbecreate