Front Office Football Central  

Go Back   Front Office Football Central > Archives > FOFC Archive
Register FAQ Members List Calendar Mark Forums Read Statistics

Reply
 
Thread Tools
Old 09-18-2009, 12:42 PM   #1
Draft Dodger
Coordinator
 
Join Date: Jan 2001
Location: Keene, NH
VB help

hopefully there's someone versed enough in VB.NET who can help me here.

I have a little program that grabs data from a web page. I'm running into trouble with a core function. I think I know what's wrong, I just don't know how to resolve it.

here's an example of a page I would be opening:
Game Summary

and the relevant function to dump that into one big text file


Function WebPageToText(ByRef url As String)
Dim GetWeb As New Net.WebClient()
Dim webStream As IO.Stream

Dim reader As New IO.StreamReader(webStream)
Dim text As String = reader.ReadToEnd()

return text
end function


that worked fine last year. this year, when I try to run the same function on a newer page, instead of the entire source code, I get a couple of gibberish characters. I think that the issue is that there's now a null line at the beginning of the source code on the new page that the old page didn't have. But I'm not really sure how to resolve this. I've tried to do something line by line, but I still just get lines of gibberish. My ooding skills have never been great, and I'm rusty as it is, so maybe I'm missing something really obvious here. Any help would be greatly appreciated.
__________________
Mile High Hockey

Draft Dodger is offline   Reply With Quote
Old 09-18-2009, 12:54 PM   #2
GoldenEagle
Grizzled Veteran
 
Join Date: Dec 2002
Location: Little Rock, AR
Try this...

Dim read As New IO.StreamReader(webStream, System.Text.Encoding.Default)
__________________
Xbox 360 Gamer Tag: GoldenEagle014
GoldenEagle is offline   Reply With Quote
Old 09-18-2009, 02:47 PM   #3
Draft Dodger
Coordinator
 
Join Date: Jan 2001
Location: Keene, NH
hmmm. I made that change and now I get different garbled text.

I also tried to do it line by line in a little 100 count for loop (text = reader.readline()), and I get about 25 lines of garbage and then 75 lines of text=nothing, even though the source code is much longer than 25 lines
__________________
Mile High Hockey

Last edited by Draft Dodger : 09-18-2009 at 02:49 PM.
Draft Dodger is offline   Reply With Quote
Old 09-18-2009, 05:29 PM   #4
Radii
Head Coach
 
Join Date: Jul 2001
The data is compressed...

C#, not VB but hopefully easy enough to see:

static void Main(string[] args)
{
WebClient webClient = new WebClient();

const string strUrl = "http://www.nhl.com/scores/htmlreports/20092010/GS010019.HTM";

webClient.DownloadFile(strUrl, "C:\\testfile.zip");

}


I ran that as my entire program(with the necessary .NET stuff surrounding it of course), and the resulting file can be opened by 7-zip and has all the data you're looking for.
Radii is offline   Reply With Quote
Old 09-18-2009, 05:31 PM   #5
Draft Dodger
Coordinator
 
Join Date: Jan 2001
Location: Keene, NH
it's compressed? *scratches head*
__________________
Mile High Hockey
Draft Dodger is offline   Reply With Quote
Old 09-18-2009, 06:05 PM   #6
Radii
Head Coach
 
Join Date: Jul 2001
do you have .NET 3.5 or an older version? It looks like there is a "GZipStream" class that was added in the 2008 version of things. I've got it working in C#, its copied from some MSDN examples so it hopefully isn't too big a deal to get it going in VB too.
Radii is offline   Reply With Quote
Old 09-18-2009, 06:17 PM   #7
Draft Dodger
Coordinator
 
Join Date: Jan 2001
Location: Keene, NH
well, you lost me. are you saying the page I'm downloading is compressed?
__________________
Mile High Hockey
Draft Dodger is offline   Reply With Quote
Old 09-18-2009, 06:25 PM   #8
Radii
Head Coach
 
Join Date: Jul 2001
Quote:
Originally Posted by Draft Dodger View Post
well, you lost me. are you saying the page I'm downloading is compressed?


hah yeah, when you programmatically request that URL, the response appears to be the page source, but zipped up. I was screwing around with the WebClient class and was able to grab one of my own known html pages with no problem, and was getting the same garbled result you mention on the nhl.com boxscore. I'm a total novice with web development so I really don't know what's possible and what's not, tbh. So I took the nhl.com url and tossed it into lynx, the unix text browser, to try to get rid of any special formatting and shit that might be going on, and it flashed a line about running gzip before displaying the data, which seemed out of place. From there I used the webclient method to save the data to a file to see what would happen, and voila, the page data could be opened by a zip client. Problem solving at its... most obtuse?
Radii is offline   Reply With Quote
Old 09-18-2009, 06:28 PM   #9
Draft Dodger
Coordinator
 
Join Date: Jan 2001
Location: Keene, NH
well, that's really freaking bizarre. and annoying. I'm not sure I'm up for handling a compressed file.
__________________
Mile High Hockey
Draft Dodger is offline   Reply With Quote
Old 09-18-2009, 06:35 PM   #10
Radii
Head Coach
 
Join Date: Jul 2001
My first VB program ever! I do think it needs VB.NET 2008 (as opposed to 2005 or 2003... I have the professional full Visual Studio 2008 here) for the compression class stuff:

Imports System.Net
Imports System.IO
Imports System.IO.Compression

Module Module1

Sub Main()
Dim GetWeb As New WebClient()
Dim strURL As String = "http://www.nhl.com/scores/htmlreports/20092010/GS010019.HTM"

Dim fileData As Byte() = GetWeb.DownloadData(strURL)

Dim strResult As String = UnZip(fileData)

Console.WriteLine(strResult)

Console.Read()
End Sub

Function UnZip(ByVal compressedBuffer As Byte())
Dim gzip As New GZipStream(New MemoryStream(compressedBuffer), CompressionMode.Decompress)

Dim decompressedBuffer As Byte() = ReadAllBytes(gzip)

Return System.Text.ASCIIEncoding.ASCII.GetString(decompressedBuffer)
End Function

Function ReadAllBytes(ByVal stream As Stream)
Dim buffer(4096) As Byte

Dim ms As New MemoryStream()

Dim bytesRead As Integer = 0

Do
bytesRead = stream.Read(buffer, 0, buffer.Length)

If bytesRead > 0 Then
ms.Write(buffer, 0, bytesRead)

End If
Loop While bytesRead > 0

Return ms.ToArray()
End Function
End Module

Last edited by Radii : 09-18-2009 at 06:38 PM.
Radii is offline   Reply With Quote
Old 09-18-2009, 06:38 PM   #11
Radii
Head Coach
 
Join Date: Jul 2001
dola, the "UnZip" and "ReadAllBytes" functions I stole from msdn articles and their comments. If you use WebClient's "DownloadData" call, it seems to feed just fine into this. The result you should get when you run this should be a cmd window with the raw text HTML.
Radii is offline   Reply With Quote
Old 09-18-2009, 06:42 PM   #12
Draft Dodger
Coordinator
 
Join Date: Jan 2001
Location: Keene, NH
wow, thank you.
I'll give that a try
__________________
Mile High Hockey
Draft Dodger is offline   Reply With Quote
Old 09-19-2009, 08:19 AM   #13
Draft Dodger
Coordinator
 
Join Date: Jan 2001
Location: Keene, NH
thanks Radii. what you did for me worked (in VS 2005 to boot).
__________________
Mile High Hockey
Draft Dodger is offline   Reply With Quote
Old 09-19-2009, 08:21 AM   #14
Draft Dodger
Coordinator
 
Join Date: Jan 2001
Location: Keene, NH
the totally awesome part is that it appears that some of the pages are compressed and some are not. I think someone is trying to tell me to scrap it.
__________________
Mile High Hockey
Draft Dodger is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is On
Forum Jump


All times are GMT -5. The time now is 10:25 AM.



Powered by vBulletin Version 3.6.0
Copyright ©2000 - 2026, Jelsoft Enterprises Ltd.