• Feed

  • Sponsors

  • Clicky Ranking

    13857
    Ranking by getClicky.com

  • « Photo mosaics | Main | MyBlogLog is buggy »

    Beta Update

    By maurizio | September 21, 2007

    If you are new here, you may want to subscribe to my RSS feed. Feel free to leave comments and questions too.Thanks for visiting!

    Today I will add a couple of feature to my beta tag search, but first of all I have to correct some bugs. The most annoying is with numebers tags. There are some tags with slash inside and of course the slashes are seen as separators and not as part of the tag so I get blank pages (Google Webmaster tool found them for me :-) ).
    A second issue is that the numbers page doesn’t have a template..ops :-)

    The big issue is with chinese/japanese/whatever tags. I am using UTF8 to print every kind of character. I can see on my database arabic/indian characters as well, but I still see some problem with the representation on screen/html. I am not sure now if the problem is due to old datas on the db or what else. One thing is sure: I have to read some pages again because some of them are not using UTF8 and thus I don’t recognize/convert their characters.

    The thing is I don’t want to read again 300.000 sites just for that..what happens if I discover some other issue? (Actually I already have one. A lot of people who’s using feedburner, forgot to convert the html tag for RSS feeds so basically they have two feeds: internal one and feedburner one. Do not feel upset if your RSS subscriber’s number isn’t that hight on feedburner then) Anyway that is a good reason to start to cache pages on my server so I can easily read them again if I have to.
    I want to do that but I was thinking to rewrite the crawler part, this time using Python. I don’t know python very well, but I see that a lot of companies (see Google) is using it for their crawlers. This could be a good opportunity to improve my CV and (finally) get a job. :-)

    I was even planning to save how many times links are clicked, just to add another parameter to the classification of sites. Let me know what do you think. (I will probably build some kind of widget to monitor site usage like MyBloglog/Technorati/whatever so people can use it to rank better here)

    Topics: Content Creation, Ramblings |

    Read other related posts:

  • Nafurai Tag List update
  • My Visitors List update 2
  • Top Feedburner feeds
  • Comments

    Subscribe without commenting