« Clicky update | Main | Top 5 most dangerous jobs »
Alexa ranking in PHP
By maurizio | May 11, 2007
If you are new here, you may want to subscribe to my RSS feed. Feel free to leave comments and questions too.Thanks for visiting!
As I discussed before on my post about Alexa’s way of “hiding” numbers on the html code, today I wrote a small piece of code to retrieve the ranking automatically.
Alexa, to prevent people to easily “fetch” informations from their webpages automatically, writes random junk together with proper information. So if you try to copy&paste your apparent ranking 123,456, you could paste something like 14323,4533,534. The trick is easy, they put inline html code and then hide it to your eyes, but not on your browser’s eyes. Let me explain it better with an example.
1<span class="abcd">2</span>345
If you write this on a html file and open it with a browser (you don’t need to put it on a webserver because it is simple html and a browser can read it without any problem) you will see “12345″, because the <span> html object is used to group inline objects, which basically means that if you want to do something special with a text which is on the same line of other text, you should put it inside a <span> </span> and then apply whatever you want to that span object. So if you want to hide some number from the above example 14323,4533,534 you should put the useless number inside spans and then hide those spans.
On my small example you simply have to add a CSS file to the html you just created:
<link rel="stylesheet" type="text/css" href="/style.css" />
1<span class="abcd">2</span>345
Then you have to create the “style.css” and save it together with the html. Inside the CSS you should simply tell that you want to hide the object with a class named “abcd”.
.abcd {
display: none
}
Save both file and then open the html one with your browser and as a result you will see “1345″. If you try to copy and paste the number it will appear as “12345″.
My small project uses the usual getPage function cited on my “easy” explanation on how I check Technorati Favorites. I use that function to get the page with the information I need and the scramble.css with the information on what to hide.
do {
$position++;
//I've found a . let's take the next 4 char
$text = substr($cssResult,$position,4);
array_push($keyArray,$text);
$position = strpos($cssResult,".",$position);
} while ($position !== false);
This is the main code to get which class (referred to the <span> object) I have to hide/delete.
- strpos : position in string. It returns the first position of the string “.” inside the string $cssResult
- substr : substring. It returns a string made by taking $cssResult and cutting out a piece that start at $position and it is 4 characters long
After I know which <span> I have to remove, I read the string with all the numbers and then analyze it to remove the useless junk. That part is too difficult to explain for now.
I will show the whole code only when I’ll write a cache for it.I don’t want you to hammer Alexa’s servers every second. :-)
Topics: Content Creation, Programming |
Read other related posts:


May 17th, 2007 at 6:42 am
[...] Alexa ranking in PHP [...]
June 16th, 2007 at 10:11 am
[...] Alexa ranking in PHP [...]