Swish E-Mail Obfuscation – A Record Aside

In “Win the SPAM Arms Race” (A Record Aside, Might 2002), Dan Benjamin talked in regards to the significance of hiding e-mail addresses on our web sites from vicious, e-mail handle harvesting bots—or spam bots, as they’re extra typically referred to as. Dan pioneered a JavaScript-based resolution for bypassing the indexing mechanisms that spam bots use. Right here’s a quote from the article:

Article Continues Under

Posting a unadorned e-mail hyperlink wherever on the internet (or in a newsgroup, in a chatroom, on a weblog feedback web page…) is mostly the kiss of dying to your once-healthy handle.

It’s laborious to imagine, nevertheless it’s been greater than 5 years since Dan wrote these phrases. So, did we win the SPAM Arms Race? As you might have seen by taking a look at your personal inbox just lately, not precisely. The Messaging Anti-Abuse Working Group (MAAWG) estimates that 90 billion spam messages are despatched every single day, and 80–85% of all incoming mail is abusive.

A shared duty#section2

Many internet customers don’t perceive the inevitable penalties of exposing their e-mail handle on the internet. Skilled internet builders and web site homeowners, nonetheless, do. Hundreds of spam bots tirelessly crawl the net to gather e-mail addresses uncovered on web sites, in weblog feedback, and elsewhere. These addresses find yourself in databases bought to unsavory entrepreneurs, who bombard the homeowners’s inboxes with junk mail.

In fact, spam is an more and more sophisticated downside that may by no means be solved by the efforts of internet builders alone. However don’t underestimate your personal powers.

An disagreeable shock#section3

I work for a big non-profit group that gives social providers for the blind and visually impaired. After Wim, our system administrator, complained in regards to the large quantities of spam our mail server needed to course of, we began a small investigation. It turned out that 90% of all spam was despatched to a mere 5% of the e-mail addresses we personal, and guess what? They have been precisely the addresses that had been revealed on our web site.

Though a lot of the injury had been completed by then (bear in mind Dan’s quote), I promised Wim I’d provide you with an efficient approach to shield the addresses on our upcoming portal, on which we intend to publish much more addresses.

My resolution would want to defeat spam and be accessible. We work intensely with and for individuals who have (principally visible) disabilities. Accessibility is just not an optionally available add-on.

Just a few months in the past, Wim very unexpectedly handed away (we miss you, Wim!). Since then, I’ve spent a number of time interested by a approach to struggle spam bots. On this article, I’ll share my concepts on the topic and go away you with a working script to construct on or to make use of in your personal tasks instantly.

The issue with present methods#section4

Wikipedia has a superb overview of anti-spam methods. Their article additionally consists of fascinating hyperlinks to articles about e-mail obfuscation. (Google the topic for extra). Over time, I’ve tried greater than a dozen of those methods. Though most appear efficient, I can’t use them in my tasks, as each one fails to satisfy a number of important necessities. My necessities are:

1. No problem, please#section5

You’ve actually seen e-mail hyperlinks that appear like “mailto:[email protected]” or “mailto:contact(at)firm(dot)com”. When you’re like me, you most likely don’t prefer to appropriate a intentionally misspelled e-mail handle after you click on on it. Furthermore, customers who don’t discover what’s fallacious with the handle will find yourself pissed off, as a result of their message can’t be despatched or delivered. Comparable methods require customers to re-type a (appropriately spelled) handle that’s rendered as a picture—which isn’t any higher, after all.

Though they don’t require JavaScript, these strategies of e-mail obfuscation add an disagreeable barrier to a job as trivial as sending an e-mail. Clearly, this isn’t the suitable approach to deal with guests or (potential) prospects. I would like actual, clickable e-mail hyperlinks that work simply as anticipated, however—on the identical time—are resistant to spam bots.

2. Swish degradation#section6

JavaScript-based methods—like Dan’s—provide the seamless person expertise I’m in search of. They’re all primarily based on the easy incontrovertible fact that spam harvesters are incapable of parsing JavaScript or understanding DOM modifications initiated by JavaScript occasions. As an alternative, spam harvesters attempt to extract e-mail addresses from uncooked HTML through the use of brute pressure algorithms—even Googlebot chokes on a lot of the JavaScript it comes upon. Solely actual browsers know how you can deal with JavaScript and may undo the obfuscation—both by stitching collectively doc.writes or through the use of a extra superior, unobtrusive, event-based method.

An vital draw back is that such options should not bulletproof. Guests who surf the net with out JavaScript help—whether or not by alternative or not—are out of luck, as a result of they’re handled as spam bots. These guests embody folks utilizing textual content browsers, previous or incapable screenreaders, or cell units with restricted capabilities. Different customers have JavaScript turned off for safety causes or due to firm insurance policies. W3Schools estimates that 6% of web customers don’t have any entry to JavaScript as of January 2007. As a comparability, when you imagine that’s not sufficient to actually care about, then perhaps it’s time to rethink why you attempt to make your markup and CSS accommodate the 1.5% of IE 5.x customers or the 1.3% of Safari customers (once more, W3Schools).

3. Set up and overlook#section7

Most e-mail obfuscation methods I’ve tried are usually bothersome and time-consuming to implement as a result of they need to be utilized to every e-mail handle that you just wish to shield. Most require you to make use of prolonged inline script parts and inline occasion handlers. They could additionally invalidate your markup.

I wished a clear and totally automated resolution that I can arrange as soon as and by no means fear about once more. That’s the one approach I can assure that all addresses that seem on our web site are protected—even those that present up in weblog feedback.

Placing it collectively#section8

Sufficient speaking. Let’s get our arms soiled.

The elements#section9

You’ll want Apache 2 and PHP 4 or later. On the internet server, the mod_rewrite module have to be enabled and you need to have the ability to set Apache directives by means of the usage of .htaccess information. Most internet hosts have this enabled by default, so that you most likely don’t have to fret about it. For assistance on these Apache-specific options, try the Apache documentation.

Put in your masks#section10

Establishing Swish E-Mail Obfuscation (GEO) includes a number of steps. The secret’s to exchange all occurrences of mailto hyperlinks with innocent-looking URLs. Take this e-mail hyperlink for example:

<a href="https://alistapart.com/article/gracefulemailobfuscation/mailto:gross [email protected]">
  E-mail our gross sales division
</a>

After the server-side therapy (I’ll get to that in a minute), that very same hyperlink will appear like this (line wraps marked » —Ed.):

<a href="https://alistapart.com/article/gracefulemailobfuscation/contact/gross sales+yourcompany+com" 
rel="nofollow">
  E-mail our gross sales division
</a>

Let’s simply take this one step additional and apply some fundamental ROT13 to it.

<a href="https://alistapart.com/article/gracefulemailobfuscation/contact/fnyrf+lbhepbzcnal+pbz" 
rel="nofollow">
  E-mail our gross sales division
</a>

From the outcomes of internet publicity assessments I did with freshly created addresses, the ROT13 encryption didn’t appear to be crucial for the approach to be efficient. Nevertheless, it does add an fascinating stage of obfuscation that actually gained’t do any hurt both. When you’re not conversant in ROT13, I ought to observe that it doesn’t add actual cryptographic safety. Wikipedia provides an correct description of what ROT13 does:

Making use of ROT13 to a bit of textual content merely requires analyzing its alphabetic characters and changing each by the letter 13 locations additional alongside within the alphabet, wrapping again to the start if crucial

There are a few different issues to notice right here:

I select “contact” as a pretend folder title for this instance, however you possibly can select something you want. To substitute the “@” and the dot within the handle, I opted for a “+”. A “+” is usually not allowed in actual e-mail addresses and it doesn’t need to be URL-encoded—which can come in useful in a while.
The rel="nofollow" half is added to instruct serps that they don’t have to comply with these hyperlinks and index subsequent pages. Learn extra about rel=“nofollow” on Microformats.org.

Away with the mailtos! We’re left with plain previous hyperlinks. Effectively, besides that they’re damaged, perhaps; however we’ll repair that quickly sufficient. As you possibly can think about, there’s little or no likelihood {that a} spam bot will establish these hyperlinks as e-mail hyperlinks—as a result of…they’re not.

The script#section11

To exchange every incidence of a mailto hyperlink in a given webpage with a daily URL, I’ll use a PHP search-and-replace common expression. The URL notation reuses components of the unique e-mail handle in order that it may be reconstructed in a while. For this, we’ll take the whole HTML web page as the topic of a PHP preg_replace() operate (line wraps marked » —Ed.):

operate encrypt_mailto($buffer) {
  preg_replace("/"mailto:([A-Za-z0-9._%-]+ »
  )@([A-Za-z0-9._%-]+).([A-Za z]{2,4})"/","" »
  contact/1+2+3" rel="nofollow"",$html)
}

With ROT13 enabled, the encrypt_mailto() operate seems fairly a bit longer, as you’ll see within the finalized PHP class that you would be able to obtain on the finish of the article.

Now I would like the script to intercept and parse all HTML pages earlier than they’re despatched to the browser. I’ll use PHP’s output buffering mechanism for that. In its easiest type, output buffering is activated through the use of a callback operate:

ob_start("encrypt_mailto");

Utilizing .htaccess, plus PHP’s little-known, however highly effective auto_prepend_file directive, we are able to now automate this course of for a complete web site or for particular folders solely. When you add the next line to your .htaccess file, prepend.inc.php will probably be mechanically included on the high of each PHP doc that Apache serves.

php_value auto_prepend_file /yourpath/prepend.inc.php

The prepend.inc.php file in itself initiates the output buffering and runs the whole contents of the served pages by means of the encrypt_mailto() operate.

Additionally observe that for this prepending to work correctly, you have to be sure that PHP code in plain HTML paperwork (with out the .php extension) is parsed by PHP as properly. Add this line to the .htaccess file:

AddType software/x-httpd-php .php .htm .html

This would possibly demand a bit extra processing energy from our internet server, nevertheless it’s the best approach to be sure that all our internet pages get the server-side particular therapy we want. When you’re utilizing a CMS or some form of software framework, you might choose to cache the server-side encryption.

Fixing the hyperlinks#section12

Now that we’ve successfully disguised our mailto hyperlinks, let’s see what occurs when somebody clicks certainly one of these humorous “contact/...” hyperlinks. Effectively, aside from the Error 404 web page: not a lot.

Ultimately, guests shouldn’t discover something uncommon about our e-mail hyperlinks. Just a few traces of JavaScript will assist us to revive these hyperlinks into their authentic form. However wait: what about these 6% that don’t have any JavaScript help? When JavaScript is just not out there, our “contact/” URLs won’t be “decrypted” on the shopper aspect, leading to a 404 error. Apache to the rescue!

Let’s configure Apache in order that its mod_rewrite module will intercept all URL requests that match the sample we outlined earlier. Apache will then derive the segments that make up the e-mail handle from the URL and go them quietly to an intervening PHP script that undoes the ROT 13 encryption and prepares the handle for additional processing. That is what the Apache rewrite rule seems like (line wraps marked » —Ed.):

RewriteRule ^.*contact/([A-Za-z0-9._%-]*)+ »
([A-Za-z0-9._%-]*)+([A-Za-z.]{2,4})$ »
/yourpath/mail.php?n=$1&d=$2&t=$3 [L]

Be aware that I needed to break up the common expression to suit on this web page, however you possibly can obtain an instance .htaccess file on the finish of the article.

Offering a chic fallback resolution#section13

Right here comes the enjoyable half! Arising with a protected, elegant and straightforward to make use of—or “sleek”—different for guests to ship an e-mail when JavaScript is unavailable, is the place your personal creativeness comes into play. The way you do it relies on the kind of web site you’re utilizing it for, however I don’t recommend utilizing a visible captcha for this objective: it’s fairly doubtless that individuals who get to see this non-JavaScript web page can’t see the captcha picture both (both as a result of they’re utilizing a display screen reader to compensate for a visible impairment, or as a result of they’re utilizing a textual content browser).

One resolution could be to supply customers a easy contact type that permits them to ship a message with out giving freely the precise handle. And in case your web site already makes use of a contact type, you might select to redirect “unencoded” mailto hyperlinks to that web page.

Usually, nonetheless, folks do need the precise handle. So, for this instance, I made a decision to immediate the person with a query that’s laborious to reply by a spam bot, however simply sufficient for people. If the suitable reply is given, the script can safely assume that it’s not coping with a spam bot and reveal the precise e-mail handle.

To see how this works, check out the demo web page I put collectively. Be sure you flip off JavaScript to see the degradation in motion. When you’re utilizing the Internet Developer Toolbar for Firefox, select Disable > JavaScript > All JavaScript.

JavaScript for the remainder of us#section14

Now that we’ve carried out a non-JavaScript fallback, let’s be sure that the opposite 94% of customers gained’t discover something “humorous” about our fastidiously masked e-mail addresses. So, let’s revert the web page’s DOM to what it seemed like earlier than the web page’s supply code was modified by the PHP script.

First, we want a JavaScript search and exchange regex that does precisely the other of what our PHP regex did. I wrote a operate round it that appears like this (line wraps marked » —Ed.):

operate geo_decode(anchor) {
  var href = anchor.getAttribute(’href’);
  var handle = href.exchange(/.*contact/ »
  ([a-z0-9._%-]+)+([a-z0-9._%-]+)+([a-z.]+)/i, »
  ’$1’ + ’@’ + ’$2’ + ’.’ + ’$3’);
  if (href != handle) {
    anchor.setAttribute(’href’,’mailto:’ + handle);
}

Subsequent, we should loop by means of all anchors on the web page and tie the geo_decode() operate to the onclick handler:

var hyperlinks = doc.getElementsByTagNameName(’a’);
for (var l = 0 ; l < hyperlinks.size ; l++) {   hyperlinks[l]. {
  geo_decode(this);
}

And eventually, let’s connect the geo_decode() operate to the window.onload object:

window.onload = operate () {
  geo_decode();
}

To make issues run easily, just a little extra code is concerned. Check out geo.js.php to see how I carried out the ROT13 “decryption.” When you learn by means of geo.phpclass.php, you’ll see that the hyperlink to geo.js.php (the file that restores your mailto hyperlinks) is auto-inserted proper earlier than closing the head tag with the assistance of PHP’s output buffering. Which means that you don’t have so as to add a single line of code to your present paperwork to make the script work.

I’ve arrange a demo web page so that you can experiment with, and it’s also possible to mess around with the supply information:

.htaccess accommodates the Apache directives to prepend geo_prepend.php and to redirect web page requests utilizing mod_rewrite.
geo.prepend.php instantiates the PHP class and units some customized properties.
geo.phpclass.php accommodates the PHP class that does the “encoding” and inserts a script tag earlier than the closing head factor that hundreds geo.js.
geo.js.php accommodates the JavaScript that’s accountable for the “decoding.”
mail.php accommodates an instance of a usable fallback script for when JavaScript is unavailable.

…or obtain the ZIP archive (8 kB).

The script works in all main browsers, together with Web Explorer 5.01.

An answer. For now.#section16

Alas, no e-mail handle that seems on-line is completely protected. Till all spam is banned from this world, we’ve got to strive our greatest to not make it too simple for spam harvesters to steal our addresses (and become profitable out of them). Now you possibly can shield your addresses in a completely automated approach whereas on the identical time being gracious to all customers, so you possibly can deal with what’s actually vital: getting your content material out.

That is solely an interim resolution. We must always all be planning for the day when spam bots get smarter, and outwit them once they do. We must always not fake that laws alone would be the silver bullet to deal with the world’s spam downside, so internet builders must proceed to provide you with inventive options to struggle the issue—and masking your addresses is certainly one of them. I sit up for studying your feedback and solutions.