PDA

View Full Version : VB help


Draft Dodger
09-18-2009, 12:42 PM
hopefully there's someone versed enough in VB.NET who can help me here.

I have a little program that grabs data from a web page. I'm running into trouble with a core function. I think I know what's wrong, I just don't know how to resolve it.

here's an example of a page I would be opening:
Game Summary (http://www.nhl.com/scores/htmlreports/20082009/GS010019.HTM)

and the relevant function to dump that into one big text file

<pre>
Function WebPageToText(ByRef url As String)
Dim GetWeb As New Net.WebClient()
Dim webStream As IO.Stream

Dim reader As New IO.StreamReader(webStream)
Dim text As String = reader.ReadToEnd()

return text
end function
</pre>

that worked fine last year. this year, when I try to run the same function on a newer page, instead of the entire source code, I get a couple of gibberish characters. I think that the issue is that there's now a null line at the beginning of the source code on the new page (http://www.nhl.com/scores/htmlreports/20092010/GS010019.HTM) that the old page (http://www.nhl.com/scores/htmlreports/20082009/GS010019.HTM) didn't have. But I'm not really sure how to resolve this. I've tried to do something line by line, but I still just get lines of gibberish. My ooding skills have never been great, and I'm rusty as it is, so maybe I'm missing something really obvious here. Any help would be greatly appreciated.

GoldenEagle
09-18-2009, 12:54 PM
Try this...

Dim read As New IO.StreamReader(webStream, System.Text.Encoding.Default)

Draft Dodger
09-18-2009, 02:47 PM
hmmm. I made that change and now I get different garbled text.

I also tried to do it line by line in a little 100 count for loop (text = reader.readline()), and I get about 25 lines of garbage and then 75 lines of text=nothing, even though the source code is much longer than 25 lines

Radii
09-18-2009, 05:29 PM
The data is compressed...

C#, not VB but hopefully easy enough to see:

static void Main(string[] args)
{
WebClient webClient = new WebClient();

const string strUrl = "http://www.nhl.com/scores/htmlreports/20092010/GS010019.HTM";

webClient.DownloadFile(strUrl, "C:\\testfile.zip");

}


I ran that as my entire program(with the necessary .NET stuff surrounding it of course), and the resulting file can be opened by 7-zip and has all the data you're looking for.

Draft Dodger
09-18-2009, 05:31 PM
it's compressed? *scratches head*

Radii
09-18-2009, 06:05 PM
do you have .NET 3.5 or an older version? It looks like there is a "GZipStream" class that was added in the 2008 version of things. I've got it working in C#, its copied from some MSDN examples so it hopefully isn't too big a deal to get it going in VB too.

Draft Dodger
09-18-2009, 06:17 PM
well, you lost me. are you saying the page I'm downloading is compressed?

Radii
09-18-2009, 06:25 PM
well, you lost me. are you saying the page I'm downloading is compressed?


hah yeah, when you programmatically request that URL, the response appears to be the page source, but zipped up. I was screwing around with the WebClient class and was able to grab one of my own known html pages with no problem, and was getting the same garbled result you mention on the nhl.com boxscore. I'm a total novice with web development so I really don't know what's possible and what's not, tbh. So I took the nhl.com url and tossed it into lynx, the unix text browser, to try to get rid of any special formatting and shit that might be going on, and it flashed a line about running gzip before displaying the data, which seemed out of place. From there I used the webclient method to save the data to a file to see what would happen, and voila, the page data could be opened by a zip client. Problem solving at its... most obtuse? :D

Draft Dodger
09-18-2009, 06:28 PM
well, that's really freaking bizarre. and annoying. I'm not sure I'm up for handling a compressed file.

Radii
09-18-2009, 06:35 PM
My first VB program ever! I do think it needs VB.NET 2008 (as opposed to 2005 or 2003... I have the professional full Visual Studio 2008 here) for the compression class stuff:

Imports System.Net
Imports System.IO
Imports System.IO.Compression

Module Module1

Sub Main()
Dim GetWeb As New WebClient()
Dim strURL As String = "http://www.nhl.com/scores/htmlreports/20092010/GS010019.HTM"

Dim fileData As Byte() = GetWeb.DownloadData(strURL)

Dim strResult As String = UnZip(fileData)

Console.WriteLine(strResult)

Console.Read()
End Sub

Function UnZip(ByVal compressedBuffer As Byte())
Dim gzip As New GZipStream(New MemoryStream(compressedBuffer), CompressionMode.Decompress)

Dim decompressedBuffer As Byte() = ReadAllBytes(gzip)

Return System.Text.ASCIIEncoding.ASCII.GetString(decompressedBuffer)
End Function

Function ReadAllBytes(ByVal stream As Stream)
Dim buffer(4096) As Byte

Dim ms As New MemoryStream()

Dim bytesRead As Integer = 0

Do
bytesRead = stream.Read(buffer, 0, buffer.Length)

If bytesRead > 0 Then
ms.Write(buffer, 0, bytesRead)

End If
Loop While bytesRead > 0

Return ms.ToArray()
End Function
End Module

Radii
09-18-2009, 06:38 PM
dola, the "UnZip" and "ReadAllBytes" functions I stole from msdn articles and their comments. If you use WebClient's "DownloadData" call, it seems to feed just fine into this. The result you should get when you run this should be a cmd window with the raw text HTML.

Draft Dodger
09-18-2009, 06:42 PM
wow, thank you.
I'll give that a try

Draft Dodger
09-19-2009, 08:19 AM
thanks Radii. what you did for me worked (in VS 2005 to boot).

Draft Dodger
09-19-2009, 08:21 AM
the totally awesome part is that it appears that some of the pages are compressed and some are not. I think someone is trying to tell me to scrap it.