Digiwar - the Yeep-blog

August 14th, 2006

Misconfigured webservers and semi-valid icons

Last night I implemented favicon icons in Feed Vortex. I’d seen this in FeedDemon and I really liked it. Instead of the icons I designed for Feed Vortex (which are pretty ugly), you can see the icons the site a feed belongs to uses. This adds some color and some familiarity to the feed list in Feed Vortex.
But downloading the icons is harder then you might expect. I expected to just get the favicon.ico at the root of the site and be done with it. Boy was I wrong.

One thing that I ran into was that some sites didn’t have a favicon.ico at their root, but at some other location. They tell the browser this by adding a

tag to the page. Fair enough. Maybe I’ll figure out a way around this, but this seems like a correct way to have the favicon at another position.

But then there are sites like www.apple.com that are simply misconfigured. Sure, apple.com has a favicon.ico at the right place. But they serve it as “text/plain”. In other words, I ask for the icon file and apple.com says: “Sure, here is the file. By the way, it’s just a text file.” But it isn’t. It’s a valid icon file…..well, semi-valid, but I’ll get to that later.

There are also sites like the mono-project.com website. They don’t have a favicon.ico, but instead of returning a “404 -page not found” error. They send a “304 page moved” HTTP code and then serve a normal webpage with the “200 Everything is A-okay” HTTP code. So Feed Vortex never knows there is no icon, it just doesn’t understand the data. It expects an image file, but gets an HTML file. I must admit that my website is also guilty of this and I need to fix this ASAP. But this is apparently the default configuration my webhost uses.

And then there are the semi-valid icon files. Sometimes a website serves an icon file and Firefox sees and understands it. Internet Explorer sees and understands it. Hell, even Windows explorer can read it, but the .NET Bitmap class can’t and tells me it’s invalid data. Livejournal.com has such an icon.

All in all this wasn’t the simple ordeal I thought it would be, but most of my feeds now have nice icons and I’ll figure out a way to deal with the others in time.

[Now playing: Shakira - Suerte]

May 13th, 2006

FeedME / Feed Vortex homepage

I’ve recently created a whole new homepage for my Feed Reader. The idea is that I’ll put general announcements concerning Feed Vortex on there. Ofcourse, the technical articles will still be posted here. If you are interested have a look at: http://www.feedvortex.com

[Now playing: Killswitch Engage - When Darkness Falls]

April 27th, 2006

Small update on FeedME

The plug-in redesign is as good as done. It’s not useful right now, a lot of stuff is still hardcoded, but the basics are there. If I ever decide to make this plug-in thing work I should be able to do so without too much effort.

I also had to deal with some bugs in the Google Reader. Hey, it’s beta and the API is not officially release, so this was to be expected. Wasn’t fun though.
Google Reader seemed to give me lot’s of duplicate articles when I requested the articles for a particular feed. I don’t know what caused it, but, for instance, the Channel9 feed gave me over 2500 articles. After some examination I noticed that a lot of then were duplicate. I’ve seen some articles reappear 44 times. So I filled a bug report for this in the Google Reader group and implemented a little check when downloading feeds. FeedME now checks if an article with that feed ID has already been downloaded and if it is, it won’t be added to the article list.
Ofcourse I still see duplicates, but far less. The unread article count went from 2500+ to around 150. And the duplicates I see now can simply be marked as read so they won’t reappear. If you mark articles with the same feed ID as read, you’ve effectively marked all of them as read. And sometimes that’s not what you want and it will make wonder where your unread articles went to. I speak from experience…

So what’s still to come?
First the new website. I’m working on this, but it takes some time.
An installer. Yes, I want you to be able to simply double click something and have FeedME installed. Right now it’s just a zip file which you need to extract and copy somewhere manually.
And finally support for labels. FeedME already uses labels and shows them by pretending they’re folders. But you cannot add, remove or edit labels just yet. This needs to be possible as well.
After that, it’s beta 1 baby!



[Last played: Devil Driver - The fury of our maker’s hand]

April 19th, 2006

Google Reader SID, how to get it

When I announced I was going to write my own Google Reader front-end, I also mentioned I might put the Google Reader API specs on this website. I haven’t done this so far, so this might be a good time to start.
I’m not going to put up a page with it just yet, but I am going to document some of the things as blog entries.
One of these things is how to get the Google SID of the logged on user.

Before I can cover the obtaining itself, I must first tell you how Google Reader knows if you are logged in or not. While others, like NewsGator, might use a username and password combination in the API calls, Google Reader works through cookies. At least, far as for as I know it does, there might be another way, but I don’t know it. One of the things I’ve seen people say is to just add the username and password in the URL when doing an API call, much like you would when accessing FTP from your browser. However, one of the security updates for IE included removing the ‘@’ as valid input when entering an URL in IE. This makes it impossible to add the username/password to the URL. FeedME uses the WinInet functions from Windows and these might not share the same URL cracker that IE does, but I just want to be sure I don’t shoot myself in the foot when Windows Vista comes out and they also secured the WinInet DLL there.
So cookies it is. Since I use WinInet it’s actually pretty easy. It shares the cookies with IE, so if you’ve logged into Google from IE, FeedME can use that cookie. If I used the .NET HttpWebRequest class (and its related classes) I wouldn’t share the cookies, but there is a work-around. Why don’t I use the standard .NET classes? Read this and you’ll understand.

Okay, so if you have the right cookie, you’re logged in, if you son’t, FeedME sends you to the login page using the embedded browser and allows you to log in from there. FeedME doesn’t even know your username and password!
But being logged in isn’t enough, some of the stuff I want to do, like download your list of feeds requires me to know your user ID, or as I’ve begun to call it, your Google SID. And that’s where I ran into a problem.

At first I would just ask for your reading-list, which basically is all the articles from all the feeds you are subscribed to. This call was based on the cookie and didn’t need to Google SID. In one of the XML nodes that came with that list, even if it was empty, was a string with the Google SID in it. So I parsed it and I was done. However, now the XML changed and now I need to find a new way.

My first try was to see if maybe the Google SID moved somewhere else, but the only place I could see it was in the ID of the list. I think the ID can change very easily, so I don’t want to be dependent on that.
The only other place I saw the Google SID was in the labels and states of articles. You see, every article can have labels and states. These are just text strings that have a special meaning within the Google Reader. These labels or states have your Google SID in them. So I can parse then and be done, but I figured that maybe sometimes you don’t have a single article on your reading list.
To test this I created a new Google account and lo’-and-behold, no articles and no Google SID. Damn!

But the webbased Google Reader interface appears to know my Google SID. So into the javascript I went.
Unfortunatly the Javascript from Google is compressed. And rightly so! I read somewhere it saves them bandwidth in the order of several GB per day. But the compression made the javascript indecipherable, so another dead end. Time to Google!

I found a lot of info on Google and the cookies, but nothing which helped me. I found someone else who had run into the same problem as me and had resorted to request the Google Reader UI page first and parse that for the Google SID. This was something I already considered, but only as a last alternative.
And I also found a Ruby script by Ben Ferrari, with which to delete all your Google Reader subscriptions. He used the SID value of the Google cookie, but this is encrypted. I thought he might know how to decrypt it, so I read the source. No decryption, but I did notice that he did some API-calls for which he would need the Google SID, but instead of the Google SID he would use a hyphen (’-'). Intrigued by this I tried it and it worked!!

Thank you Ben!!

Ofcourse _now_ I look at the Google Reader blog en see that one of their tips also uses a hyphen instead of a Google SID. Ah well…

[Last played: Lacuna Coil - The Game]

April 18th, 2006

FeedME progress

I’m already working on the 4th alpha version of FeedME. I added a lot of stuff and decided it was time for a overhaul. That’s the downside to creating a program without a full-fledged design. But it makes the programming more fun.
The reason for the overhaul is that I wanted to add more and more stuff, but ran into problems implementing them. So far I’ve been able to find a solution simple and easy enough, so that proofs to me that I designed the program very good in my head up front. But not, with adding folders, it’s become more difficult. So I’ve decided to improve the design instead of trying to break my head in getting it to work well enough. An overhaul is inevitable, so why not do it now?

I also found out that the name “FeedME” has been taken by various people/organizations, including one for another feed reader. So I’ve decided to rename the program. I won’t reveal the name until the website is done however. I’m working on that as well.

If you want to see what I’ve done so far, you can download FeedME Alpha 3 from my website. It will be a few weeks before I get the releasing alpha 4 I think. I want to make it work better and easier to improve first. I also think that Alpha 4 will be the last FeedME release, after that I will release the program with the new name and with an installer to make things easier. I will also announce the program more widely.

My plans for FeedME:
- Make a cool, basic, feed reader.
- Make it expandable so people can add other back-ends (I make it for use with the Google Reader back-end).
- Go after FeedDemon (Allright, so this is a far fetched dream, but what’s the point in doing something without a hard goal to achieve).

Lastly I have 5 articles I still need to post. Or actually, take time to write about. I have them in my “drafts” list, but I still need to do them. I know that writing keeps me in touch with the people “out there” and the more I write the more people will find me through Google, but I’d rather be programming to be honest. But I’ll do my best.

[Last played: Evanescence - Taking Over Me]

March 11th, 2006

FeedME 1.0 Alpha 1

I have made very good progress on my news aggregator using the Google Reader back-end. It’s in a state where it’s useful enough for dialy use. So that’s exactly what I’m going to do. Also, with a few minor adjustments, I’m going to release it as the first alpha version. I’m really curious as to if people will use it in this state.

Back to Visual Studio for me!

[Last played: Marilyn Manson - Tainted Love]

March 9th, 2006

I will not REST until this problem is solved

The Google API for Google Reader has not been released. At least officially. I found a real good head-up at Niall Kennedy’s weblog. The information there is more then enough to get you started. The information in the main article is not complete though. You can find out some more tidbits in the comments and I also found some more stuff while working on FeedME. It hadn’t occured to me how easy it actually is to get some information about the API. I figured, it’s a web application, so all the interesting communication happens at the server itself, no way I can get to that. I had even downloaded the javascript file Google Reader uses for it’s work when it occured to me that, yes, this is a web application. It’s a web 2.0 (I really hate that term, but now everybody should know what I mean) application. It uses Ajax, which means that all communication with the server backend originates from my own machine! Add Fiddler and you’ve got a neat protocol-sniffing-setup :-)
I’m thinking about doing a page on the Google Reader API to put all the info I have at one location.

Anyways this is not about the Google Reader API, at least not in depth. What I am going to write about is the HttpWebRequest, WebResponse and the Uri classes of the .NET framework 2.0.
These were the classes that I wanted to use for communicating with Google. The are built for that purpose! In fact, they worked perfectly for the first API I incorperated, the call to download your subscription list. After I had that working and had a really basic GUI I was ready to move on to the next thing: Downloading the articles for the feeds I’m subscribed to.
The Google Reader API is a REST API (or at least based on some REST principles). This means is uses simple HTTP GET and HTTP POST commands to communicate with XML. Okay, so this also covers SOAP, but the thing with REST is that it doesn’t have predefined XML to use. It’s all up to the implementation. Whereas SOAP has everything standardized except the data itself.

Getting your subscription list is just sending a GET at a specific URL, let’s say: “http://www.google.com/reader/api/subscriptions/” (This isn’t the real URL, just an example). You’ll then get some XML back that describes your subscriptions. Simple to do with HttpWebRequest.
To get the articles of a specific feed, you need to add the URL of the feed to the URL. An example (also not real): “http://www.google.com/reader/api/feed/http://www.digiwar.com/feeds/”
This would give you the articles from this website that Google have in their Reader database. Now here I ran into a problem.

The HttpWebRequest class takes, or converts your string into, a Uri object. The Uri object is a real useful class that does all kinds of useful stuff to the URL you supply. Like checking if it’s a valid URL, removing contradicting directory paths or removing those pesky doubles slashes some people put in their URLs by accident. Do you see the problem already? If not, I’ll give you 5 minutes to think about it.

Take your time.

You got it? Good. So you aggree with me that I was really frustrated that my HttpWebRequests ended up going to “http://www.google.com/reader/api/feed/http:/www.digiwar.com/feeds/” instead of the correct URL. Nothing I could do would help. I tried double-double slashes, tried escaping them, tried encoding them as “%2F”. Nothing worked. I did a lot of Googling on the subject, inquired in newsgroups. Nowhere did I find an answer. So what did I do? I wrote my own HttpRequest, HttpResponse and Url classes.

The Url class was really simple. It just houses a few strings and does some parsing in the constructor. It does no validation whatsoever.
The HttpRequest and the HttpResponse class where a bit harder. For them I P/Invoked a lot of WinInet functions. One important thing learned here: If you ever need the P/Invoke signature of a Win32 function, or at least a good pointer in the right direction, just google for “dllimport “. Also visit pinvoke.net, it’s really useful.

And ofcourse, after I had everything up and running (beautifully I might add) some one found an answer for me. Apparently if you encode the second double slash as ‘\u2215′. So you’ll end up with “http://www.google.com/reader/api/feed/http:/\u2215www.digiwar.com/feeds/” and this seems to work. I still have to test it in a realy world scenario (such as my program), but I’ve just invested a lot of time in my custom classes and it’s working perfectly, so why switch?
I might switch back to the .NET classes sometime in the future, but for now I’m content with how it is.

[Now playing: The Kovenant - Jihad]

March 7th, 2006

Writing your own news aggregator

So I took the task upon myself to write a Windows application that allows me to sync with Google Reader and allows me to read my feeds I have in Google Reader. This is not a post for plugging that application though. This is a post, or probably one of a series of posts, about the things I learned while programming the application.

But to spend just a few words on the application: I’m going to call it “FeedME”. I had thought up that name probably two years ago when me and a colleague were already considering writing a feed reader. The project never got really far, but the name stuck.

So far the application allows you to log in, retrieve the feeds your subscribed to and then the unread articles from those feeds. The articles are show in a real simple newspaper format in an embedded Internet Explorer control. That’s about it.
The biggest hurdles I crossed so far? Downloading the feeds from Google (getting the subscription list is easy, getting the articles of a particular feed is much harder, more on that later) and logging into Google.

Logging into the Google Reader is complex, but not really that hard when you think about it. The thing is you can’t just send a username and password combination to a webservic. At least not that I know of. No, you have to use cookies and then, somehow, find out the Google SID for the user. I will explain this in detail later!

So, expect some articles about the problems I encountered and how I solved them (or worked around them) in the future. And if I get it into a workable state, I will release FeedME for everyone to use. Free of charge!

Right now I’m gonna see if I can come up with a nice way to mark a seperate article as ‘read’ and ‘unread’ and how to tell Google that and how to show that in my application (no support for read articles yet).

[Last played: Static-X - Trance is the motion]

|