Downloading Entire Blog Site

A couple of weeks ago I discovered Waiter Rant, an amusing blog written by a waiter who works in an expensive Italian restaurant. It’s a great great, and the guy has done over 300 posts! So, do I want to sit there and download them all? Not really.

Plus the worst part of it is that his pages are huge:
<pre> Documents (1 file) 23 kb
Images (36 files) 58 kb
Style Sheets (1 file) 9 kb
Scripts (3 files) 5 kb
Total 94 kb
</pre>
94 kb for 1 post! That means to download his >330 post website it would take over 29 meg! That’s just freaking stupid. Do I really want to download his ads every time I want to read a story about a bitchy customer? Nope. So instead I wrote a program to download just the neccesary parts for me and compile it all together.

Each post now comes in at a handsome 10k. Now his entirety of posts comes to 3 megs, and I can read it offline. Cool. The code is c#.net, and to compile it you can use:

\WINDOWS\Microsoft.NET\Framework\v1.1.4322\csc req.cs

and then you can run the executable.
<pre>
using System;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;

public class req
{
static WebProxy proxy;

public static String getHtml(int number)
{
WebRequest req = WebRequest.Create(“http://waiterrant.net/?p=” + number);
req.Proxy = proxy;
WebResponse resp = req.GetResponse();
Stream strm = resp.GetResponseStream();
StreamReader sr = new StreamReader(strm);
string line = “”;
string output = “”;
while (line != null)
{
output = output + line;
line = sr.ReadLine();
Match m = Regex.Match(line, “Subscribe to comments with”);
if (m.Success)
{
line = null;
}
}
strm.Close();
return output;
}

static void Main()
{
proxy = new WebProxy(“http://myproxy:80/”,true);
proxy.Credentials = new NetworkCredential(“proxyusername”,”proxypassword”, “proxydomain”);

int i;
int startpost = 1;
int endpost = 10;
System.IO.StreamWriter writer;
writer = System.IO.File.CreateText(“c:\temp\waiter2.htm”);
for (i=endpost;i>startpost;i–)
{
String html = getHtml(i);
Match m = Regex.Match(html, “<div class="archivepost">(.*)from your own site”);
if (m.Success)
{
writer.WriteLine (“<table><tr><td>” + m + “</div></div></td></tr></table>”);
writer.WriteLine (html);
Console.WriteLine (“Downloaded post: “ + i);
}
}
writer.Close();
}
}</pre>
PS Paris the Homemaker is still up.