2007-06-08

Hitting the transaction limit

It looks like it will be necessary to occasionally break down and re-establish the connection to the database in the importDump.php utility. Running it for too long results in the following:
C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance>"C:\Program Files\Zend\Core\bin
\php.exe" ImportDump.php s:\enwiki-20070402-pages-articles.xml
100 (50.09 pages/sec 50.09 revs/sec)
200 (51.75 pages/sec 51.75 revs/sec)
300 (50.43 pages/sec 50.43 revs/sec)
400 (50.78 pages/sec 50.78 revs/sec)
500 (51.62 pages/sec 51.62 revs/sec)
600 (52.31 pages/sec 52.31 revs/sec)
700 (51.62 pages/sec 51.62 revs/sec)
800 (51.35 pages/sec 51.35 revs/sec)
900 (51.24 pages/sec 51.24 revs/sec)
1000 (51.57 pages/sec 51.57 revs/sec)
1100 (51.85 pages/sec 51.85 revs/sec)
1200 (52.11 pages/sec 52.11 revs/sec)
1300 (51.62 pages/sec 51.62 revs/sec)
1400 (51.11 pages/sec 51.11 revs/sec)
1500 (50.16 pages/sec 50.16 revs/sec)
1600 (50.30 pages/sec 50.30 revs/sec)
1700 (50.09 pages/sec 50.09 revs/sec)
1800 (50.31 pages/sec 50.31 revs/sec)
1900 (50.39 pages/sec 50.39 revs/sec)
2000 (50.37 pages/sec 50.37 revs/sec)
2100 (50.34 pages/sec 50.34 revs/sec)
2200 (50.30 pages/sec 50.30 revs/sec)
2300 (50.41 pages/sec 50.41 revs/sec)
2400 (50.51 pages/sec 50.51 revs/sec)
2500 (49.89 pages/sec 49.89 revs/sec)
2600 (50.17 pages/sec 50.17 revs/sec)
2700 (50.34 pages/sec 50.34 revs/sec)
2800 (50.08 pages/sec 50.08 revs/sec)
2900 (50.18 pages/sec 50.18 revs/sec)
3000 (49.94 pages/sec 49.94 revs/sec)
3100 (48.71 pages/sec 48.71 revs/sec)
3200 (48.58 pages/sec 48.58 revs/sec)
3300 (48.43 pages/sec 48.43 revs/sec)
3400 (48.49 pages/sec 48.49 revs/sec)
3500 (48.51 pages/sec 48.51 revs/sec)
3600 (48.54 pages/sec 48.54 revs/sec)
3700 (48.52 pages/sec 48.52 revs/sec)
3800 (25.39 pages/sec 25.39 revs/sec)
3900 (15.09 pages/sec 15.09 revs/sec)
4000 (10.41 pages/sec 10.41 revs/sec)
4100 (8.62 pages/sec 8.62 revs/sec)
4200 (6.98 pages/sec 6.98 revs/sec)
4300 (4.98 pages/sec 4.98 revs/sec)
4400 (3.79 pages/sec 3.79 revs/sec)
4500 (3.20 pages/sec 3.20 revs/sec)
4600 (2.66 pages/sec 2.66 revs/sec)
4700 (2.45 pages/sec 2.45 revs/sec)
4800 (2.25 pages/sec 2.25 revs/sec)
4900 (2.13 pages/sec 2.13 revs/sec)
5000 (1.99 pages/sec 1.99 revs/sec)
5100 (1.83 pages/sec 1.83 revs/sec)
5200 (1.70 pages/sec 1.70 revs/sec)
5300 (1.61 pages/sec 1.61 revs/sec)
5400 (1.56 pages/sec 1.56 revs/sec)
5500 (1.49 pages/sec 1.49 revs/sec)
5600 (1.43 pages/sec 1.43 revs/sec)
5700 (1.37 pages/sec 1.37 revs/sec)
5800 (1.34 pages/sec 1.34 revs/sec)
5900 (1.30 pages/sec 1.30 revs/sec)
6000 (1.28 pages/sec 1.28 revs/sec)
6100 (1.23 pages/sec 1.23 revs/sec)
6200 (1.19 pages/sec 1.19 revs/sec)
6300 (1.13 pages/sec 1.13 revs/sec)
6400 (1.09 pages/sec 1.09 revs/sec)
6500 (1.04 pages/sec 1.04 revs/sec)
6600 (1.01 pages/sec 1.01 revs/sec)
6700 (0.98 pages/sec 0.98 revs/sec)
6800 (0.96 pages/sec 0.96 revs/sec)
6900 (0.95 pages/sec 0.95 revs/sec)
7000 (0.94 pages/sec 0.94 revs/sec)
7100 (0.91 pages/sec 0.91 revs/sec)
7200 (0.89 pages/sec 0.89 revs/sec)
7300 (0.87 pages/sec 0.87 revs/sec)
7400 (0.86 pages/sec 0.86 revs/sec)
7500 (0.86 pages/sec 0.86 revs/sec)
7600 (0.87 pages/sec 0.87 revs/sec)
7700 (0.87 pages/sec 0.87 revs/sec)
7800 (0.88 pages/sec 0.88 revs/sec)
7900 (0.86 pages/sec 0.86 revs/sec)
8000 (0.86 pages/sec 0.86 revs/sec)
exception 'com_exception' with message 'Source: Microsoft OLE DB Provider for SQL Server
Description: Cannot start more transactions on this session.' in C:\PHP\includes\adodb\drivers\adodb-ado5.inc.php:290
Stack trace:
#0 C:\PHP\includes\adodb\drivers\adodb-ado5.inc.php(290): com->BeginTrans()
#1 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\DatabaseADODB.php(1135): ADODB_ado->BeginTrans()
#2 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\JobQueue.php(170): DatabaseADODB->begin()
#3 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\HTMLCacheUpdate.php(81): Job::batchInsert(Array)
#4 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\HTMLCacheUpdate.php(46): HTMLCacheUpdate->insertJobs(Object(ResultWrapper))
#5 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\Title.php(2424): HTMLCacheUpdate->doUpdate()
#6 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\Article.php(2644): Title->touchLinks()
#7 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\SpecialImport.php(363): Article::onArticleCreate(Object(Title))
#8 [internal function]: WikiRevision->importOldRevision()
#9 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\DatabaseADODB.php(1048): call_user_func_array(Array, Array)
#10 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\SpecialImport.php(510): DatabaseADODB->deadlockLoop(Array)
#11 [internal function]: WikiImporter->importRevision(Object(WikiRevision))
#12 C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance\importDump.php(58): call_user_func(Array, Object(WikiRevision))
#13 [internal function]: BackupReader->handleRevision(Object(WikiRevision), Object(WikiImporter))
#14 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\SpecialImport.php(761): call_user_func_array(Array, Array)
#15 [internal function]: WikiImporter->out_revision(Resource id #55, 'revision')
#16 C:\Inetpub\wwwroot\wiki193\mediawiki\includes\SpecialImport.php(426): xml_parse(Resource id #55, ']] and [[Botswa...', 0)
#17 C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance\importDump.php(109): WikiImporter->doImport()
#18 C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance\importDump.php(91): BackupReader->importFromHandle(Resource id #54)
#19 C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance\importDump.php(129): BackupReader->importFromFile('s:\bauch\enwiki...')
#20 {main}
C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance>

2007-06-07

Testing (and fixing) the Maintenance Utilities

With the wiki up and stable for a couple of weeks now, it was time to try to push some limits. To do this, I downloaded a dump of the English Wikipedia. The command-line tool ImportDump.php works for a while (albeit slower than I would have liked). Here's what prints to the console:
C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance>"C:\Program Files\Zend\Core\bin
\php.exe" ImportDump.php s:\enwiki-20070402-pages-articles.xml
100 (3.27 pages/sec 3.27 revs/sec)
200 (1.90 pages/sec 1.90 revs/sec)
300 (1.40 pages/sec 1.40 revs/sec)
400 (1.13 pages/sec 1.13 revs/sec)
500 (1.04 pages/sec 1.04 revs/sec)
600 (1.01 pages/sec 1.01 revs/sec)
700 (0.79 pages/sec 0.79 revs/sec)
800 (0.74 pages/sec 0.74 revs/sec)
900 (0.68 pages/sec 0.68 revs/sec)
1000 (0.66 pages/sec 0.66 revs/sec)
1100 (0.63 pages/sec 0.63 revs/sec)
1200 (0.62 pages/sec 0.62 revs/sec)
1300 (0.59 pages/sec 0.59 revs/sec)
1400 (0.59 pages/sec 0.59 revs/sec)
A database query syntax error has occurred.
The last attempted database query was:
"COMMIT"
from within function "Database::deadlockLoop".
MySQL returned error "3902: e (jmolafbwikidev)"
This turns out to require some fixes in my DatabaseADODB.php. In particular, in the deadlockLoop() method. It also looks like my indexes on the page table need a little tweeking. After that, rebuildrecentchanges.php doesn't want to work at all
C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance>"C:\Program Files\Zend\Core\bin
\php.exe" rebuildrecentchanges.php
PHP Notice:  Undefined variable: wgDBadminuser in C:\Inetpub\wwwroot\wiki193\med
iawiki\maintenance\rebuildrecentchanges.php on line 15
PHP Notice:  Undefined variable: wgDBadminpassword in C:\Inetpub\wwwroot\wiki193
\mediawiki\maintenance\rebuildrecentchanges.php on line 16
Loading from page and revision tables...
A database query syntax error has occurred.
The last attempted database query was:
"INSERT  INTO [recentchanges] (rc_timestamp,rc_cur_time,rc_user,rc_user_text,rc_
namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_
last_oldid,rc_type) SELECT TOP 5000  rev_timestamp,rev_timestamp,rev_user,rev_us
er_text,page_namespace,page_title,rev_comment,rev_minor_edit,0,page_is_new,page_
id,rev_id,0, IF(page_is_new != 0, 1, 0)  FROM [page],[revision]   WHERE (rev_tim
estamp > '20070531214411') AND (rev_page=page_id)  ORDER BY rev_timestamp DES
C"
from within function "rebuildRecentChangesTablePass1".
MySQL returned error "170: n (jmolafbwikidev)"
The fix for this is to promote the method conditional from the Database class to the DatabaseADODB class, where it is implemented as:
 function conditional( $cond, $trueVal, $falseVal ) {
  return " CASE WHEN $cond THEN $trueVal ELSE $falseVal END ";
 }
Running again reveals another problem:
C:\Inetpub\wwwroot\wiki193\mediawiki\maintenance>"C:\Program Files\Zend\Core\bin
\php.exe" rebuildrecentchanges.php
PHP Notice:  Undefined variable: wgDBadminuser in C:\Inetpub\wwwroot\wiki193\med
iawiki\maintenance\rebuildrecentchanges.php on line 15
PHP Notice:  Undefined variable: wgDBadminpassword in C:\Inetpub\wwwroot\wiki193
\mediawiki\maintenance\rebuildrecentchanges.php on line 16
Loading from page and revision tables...
Updating links...
A database query syntax error has occurred.
The last attempted database query was:
"SELECT rev_id FROM [revision] WHERE rev_page=87 AND rev_timestamp<'200706052
02808' ORDER BY rev_timestamp DESC LIMIT 1"
from within function "".
MySQL returned error "170: n (jmolafbwikidev)"
 
It looks like we need to add some alternative SQL to rebuildRecentChangesTablePass2() in rebuildrecentchanges.inc as follows:
if ( $wgDBtype == 'adodb' ) {
 $sql2 = "SELECT TOP 1 rev_id FROM $revision " .
  "WHERE rev_page={$lastCurId} ".
  "AND rev_timestamp < '{$emit}' ORDER BY rev_timestamp DESC";
} else {
 $sql2 = "SELECT rev_id FROM $revision " .
  "WHERE rev_page={$lastCurId} ".
  "AND rev_timestamp<'{$emit}' ORDER BY rev_timestamp DESC LIMIT 1";
}
And that fixes it.

2007-06-05

Changes required

In addition to the obvious (addition of the code to support Microsoft SQL Server), some other files need to change, and a few other files need to be added to the system.
includes/Autoloader.php (change)
This file needs just a couple of additions, as follows
static $localClasses = array(
    # Includes
    'DatabaseADODB' => 'includes/DatabaseADODB.php',
    ...
    'SearchADODB' => 'includes/SearchADODB.php',
    ...
includes/BagOStuff.php (change)
In SqlBagOStuff::get($key), the SQL needs a little fix, from
"SELECT value,exptime FROM $0 WHERE keyname='$1'", $key);
to
"SELECT value,exptime FROM $0 WHERE keyname=$1", $key);
Microsoft SQL Server is thrown off by what become extraneous single quotes. A similar change must be made to SqlBagOStuff::delete($key,$time=0). In the methods _serialize(&$data) and _unserialize($serial), I've had to remove the calls to gzdeflate and ginflate, since I couldn't get them to behave consistently.
includes/Database.php
To this base class, I've added two new abstract methods
public function setFetchModeAssoc() {
}
public function setFetchModeNum() {
}

These are used very little, and only by a couple of the Special: pages. By default, the ADODB connection operates in numeric fetch mode (i.e., it returns records in an array subscripted by numbers). After a call to setFetchModeAssoc the connection operates in associative fetch mode (i.e., it returns records in an array subscripted by field names). This is different from the MySQL connection, which returns arrays that can be indexed both ways.
includes/GlobalFunctions.php In function wfShellExec($cmd, &$retval=null), I changed the line
$cmd = '"' . $cmd . '"'; 
to
$cmd = 'cmd /C ' . '"' . str_replace( '/', '\\', $cmd) . '"';

In includes/MagicWord.php, I changed a line in MagicWord::initRegex() from
$case = $this-mCaseSensitive ? '' : 'iu';
to
$case = $this->mCaseSensitive ? '' : 'i';
I also had to rewrite matchAndRemove(&$text) and matchStartAndRemove(&$text)
In includes/Pager.php, method IndexPager::reallyDoQuery($offset,$limit,$ascending) the line
$res = $this->mDb->select( $tables, $fields, $conds, $fname, $options );
must be wrapped up like this:
$this->mDb->setFetchModeAssoc();
$res = $this->mDb->select( $tables, $fields, $conds, $fname, $options );
$this->mDb->setFetchModeNum();