Hi All,
Recently I had a thought about trackback.php to attempt to discard spam so I've made this minor alteration. Could be it reviewed and perhaps included? Thanks
The purpose of this is to simply reject trackback requests when the source of the connection is ipv4 and stored in a RBL (I don't know how well maintained ipv6 RBL databases are).
This requires $wgUseTrackbacksRBL==true to be added to LocalSettings.php
Index: trackback.php =================================================================== @@ -43,17 +43,33 @@ $tburl = strval( $_POST['url'] ); $tbname = strval( @$_POST['blog_name'] ); $tbarticle = strval( $_REQUEST['article'] ); +$tbip = strval( $_SERVER['REMOTE_ADDR'] );
$title = Title::newFromText($tbarticle); if( !$title || !$title->exists() ) XMLerror( "Specified article does not exist." );
+ +if( $wgUseTrackbacksRBL==true && substr_count( $tbip, ":" ) == 0 && substr_count( $ip, "." ) > 0 ) { + + $rbl_list = array( "zen.spamhaus.org", "dnsbl.njabl.org", "dnsbl.sorbs.net", "bl.spamcop.net" ); + + foreach( $rbl_list as $rbl_site ) { + $ip_arr = array_reverse( explode( '.', $tbip ) ); + $lookup = implode( '.', $ip_arr ) . '.' . $rbl_site; + if( $lookup != gethostbyname( $lookup ) ) { + XMLerror( $tbip . " is listed in " . $rbl_site ); + } + } +} +
Thanks for your time.
On Wed, Jun 16, 2010 at 08:00:09PM +0100, ed neville wrote:
Hi All,
Recently I had a thought about trackback.php to attempt to discard spam so I've made this minor alteration. Could be it reviewed and perhaps included? Thanks
...
Index: trackback.php
@@ -43,17 +43,33 @@ $tburl = strval( $_POST['url'] ); $tbname = strval( @$_POST['blog_name'] ); $tbarticle = strval( $_REQUEST['article'] ); +$tbip = strval( $_SERVER['REMOTE_ADDR'] );
$title = Title::newFromText($tbarticle); if( !$title || !$title->exists() ) XMLerror( "Specified article does not exist." );
+if( $wgUseTrackbacksRBL==true && substr_count( $tbip, ":" ) == 0 && substr_count( $tbip, "." ) > 0 ) {
$rbl_list = array( "zen.spamhaus.org", "dnsbl.njabl.org",
"dnsbl.sorbs.net", "bl.spamcop.net" );
foreach( $rbl_list as $rbl_site ) {
$ip_arr = array_reverse( explode( '.', $tbip ) );
$lookup = implode( '.', $ip_arr ) . '.' . $rbl_site;
if( $lookup != gethostbyname( $lookup ) ) {
XMLerror( $tbip . " is listed in " . $rbl_site
);
}
}
+}
Any chance anyone can give their thoughts on the above?
On 06/16/2010 10:00 PM, ed neville wrote:
Index: trackback.php
@@ -43,17 +43,33 @@ $tburl = strval( $_POST['url'] ); $tbname = strval( @$_POST['blog_name'] ); $tbarticle = strval( $_REQUEST['article'] ); +$tbip = strval( $_SERVER['REMOTE_ADDR'] );
$title = Title::newFromText($tbarticle); if( !$title || !$title->exists() ) XMLerror( "Specified article does not exist." );
+if( $wgUseTrackbacksRBL==true && substr_count( $tbip, ":" ) == 0 && substr_count( $ip, "." )> 0 ) {
$rbl_list = array( "zen.spamhaus.org", "dnsbl.njabl.org", "dnsbl.sorbs.net", "bl.spamcop.net" );
foreach( $rbl_list as $rbl_site ) {
$ip_arr = array_reverse( explode( '.', $tbip ) );
$lookup = implode( '.', $ip_arr ) . '.' . $rbl_site;
if( $lookup != gethostbyname( $lookup ) ) {
XMLerror( $tbip . " is listed in " . $rbl_site );
}
}
+}
Some quick comments:
$_SERVER['REMOTE_ADDR'] won't work as expected if the server is running behind a transparent proxy. Should probably use wfGetIP() instead.
Testing for IPv4 addresses by checking for the presence of a period and the absence of colons seems a bit hacky, even if it'll mostly work. You may want to use IP::isIPv4() (and/or possibly IP::canonicalize()) instead. There are other methods in IP.php that may be useful too.
The list of blacklists to query should probably be configurable. It should perhaps default to the general $wgDnsBlacklistUrls list. Also, there's existing code for querying DNS blacklists in User::inDnsBlacklist(). You should probably reuse that instead of writing your own.
The usual procedure for submitting patches is to file a bug report on bugzilla.wikimedia.org (as an enhancement, in this case) and attaching the patch to the bug. If nobody seems to have noticed the bug, you can then try to attract more attention to it here or on #mediawiki@freenode.
In general, this looks like a good idea. Having separate configuration variables is probably reasonable; some wiki maintainers might want to query blacklists only for trackbacks but not for normal editing (maybe because they're using other anti-spam mechanisms for that) or vice versa. Then again, I can see arguments for using the same config variables for both, too (simplicity).
wikitech-l@lists.wikimedia.org