![]() |
|
|
#1 | ||
|
Coordinator
Join Date: Jan 2001
Location: Keene, NH
|
VB help
hopefully there's someone versed enough in VB.NET who can help me here.
I have a little program that grabs data from a web page. I'm running into trouble with a core function. I think I know what's wrong, I just don't know how to resolve it. here's an example of a page I would be opening: Game Summary and the relevant function to dump that into one big text file
that worked fine last year. this year, when I try to run the same function on a newer page, instead of the entire source code, I get a couple of gibberish characters. I think that the issue is that there's now a null line at the beginning of the source code on the new page that the old page didn't have. But I'm not really sure how to resolve this. I've tried to do something line by line, but I still just get lines of gibberish. My ooding skills have never been great, and I'm rusty as it is, so maybe I'm missing something really obvious here. Any help would be greatly appreciated.
__________________
Mile High Hockey |
||
|
|
|
|
|
#2 |
|
Grizzled Veteran
Join Date: Dec 2002
Location: Little Rock, AR
|
Try this...
Dim read As New IO.StreamReader(webStream, System.Text.Encoding.Default)
__________________
Xbox 360 Gamer Tag: GoldenEagle014 |
|
|
|
|
|
#3 |
|
Coordinator
Join Date: Jan 2001
Location: Keene, NH
|
hmmm. I made that change and now I get different garbled text.
I also tried to do it line by line in a little 100 count for loop (text = reader.readline()), and I get about 25 lines of garbage and then 75 lines of text=nothing, even though the source code is much longer than 25 lines
__________________
Mile High Hockey Last edited by Draft Dodger : 09-18-2009 at 02:49 PM. |
|
|
|
|
|
#4 |
|
Head Coach
Join Date: Jul 2001
|
The data is compressed...
C#, not VB but hopefully easy enough to see: static void Main(string[] args) { WebClient webClient = new WebClient(); const string strUrl = "http://www.nhl.com/scores/htmlreports/20092010/GS010019.HTM"; webClient.DownloadFile(strUrl, "C:\\testfile.zip"); } I ran that as my entire program(with the necessary .NET stuff surrounding it of course), and the resulting file can be opened by 7-zip and has all the data you're looking for. |
|
|
|
|
|
#5 |
|
Coordinator
Join Date: Jan 2001
Location: Keene, NH
|
it's compressed? *scratches head*
__________________
Mile High Hockey |
|
|
|
|
|
#6 |
|
Head Coach
Join Date: Jul 2001
|
do you have .NET 3.5 or an older version? It looks like there is a "GZipStream" class that was added in the 2008 version of things. I've got it working in C#, its copied from some MSDN examples so it hopefully isn't too big a deal to get it going in VB too.
|
|
|
|
|
|
#7 |
|
Coordinator
Join Date: Jan 2001
Location: Keene, NH
|
well, you lost me. are you saying the page I'm downloading is compressed?
__________________
Mile High Hockey |
|
|
|
|
|
#8 | |
|
Head Coach
Join Date: Jul 2001
|
Quote:
hah yeah, when you programmatically request that URL, the response appears to be the page source, but zipped up. I was screwing around with the WebClient class and was able to grab one of my own known html pages with no problem, and was getting the same garbled result you mention on the nhl.com boxscore. I'm a total novice with web development so I really don't know what's possible and what's not, tbh. So I took the nhl.com url and tossed it into lynx, the unix text browser, to try to get rid of any special formatting and shit that might be going on, and it flashed a line about running gzip before displaying the data, which seemed out of place. From there I used the webclient method to save the data to a file to see what would happen, and voila, the page data could be opened by a zip client. Problem solving at its... most obtuse? ![]() |
|
|
|
|
|
|
#9 |
|
Coordinator
Join Date: Jan 2001
Location: Keene, NH
|
well, that's really freaking bizarre. and annoying. I'm not sure I'm up for handling a compressed file.
__________________
Mile High Hockey |
|
|
|
|
|
#10 |
|
Head Coach
Join Date: Jul 2001
|
My first VB program ever! I do think it needs VB.NET 2008 (as opposed to 2005 or 2003... I have the professional full Visual Studio 2008 here) for the compression class stuff:
Imports System.Net Imports System.IO Imports System.IO.Compression Module Module1 Sub Main() Dim GetWeb As New WebClient() Dim strURL As String = "http://www.nhl.com/scores/htmlreports/20092010/GS010019.HTM" Dim fileData As Byte() = GetWeb.DownloadData(strURL) Dim strResult As String = UnZip(fileData) Console.WriteLine(strResult) Console.Read() End Sub Function UnZip(ByVal compressedBuffer As Byte()) Dim gzip As New GZipStream(New MemoryStream(compressedBuffer), CompressionMode.Decompress) Dim decompressedBuffer As Byte() = ReadAllBytes(gzip) Return System.Text.ASCIIEncoding.ASCII.GetString(decompressedBuffer) End Function Function ReadAllBytes(ByVal stream As Stream) Dim buffer(4096) As Byte Dim ms As New MemoryStream() Dim bytesRead As Integer = 0 Do bytesRead = stream.Read(buffer, 0, buffer.Length) If bytesRead > 0 Then ms.Write(buffer, 0, bytesRead) End If Loop While bytesRead > 0 Return ms.ToArray() End Function End Module Last edited by Radii : 09-18-2009 at 06:38 PM. |
|
|
|
|
|
#11 |
|
Head Coach
Join Date: Jul 2001
|
dola, the "UnZip" and "ReadAllBytes" functions I stole from msdn articles and their comments. If you use WebClient's "DownloadData" call, it seems to feed just fine into this. The result you should get when you run this should be a cmd window with the raw text HTML.
|
|
|
|
|
|
#12 |
|
Coordinator
Join Date: Jan 2001
Location: Keene, NH
|
wow, thank you.
I'll give that a try
__________________
Mile High Hockey |
|
|
|
|
|
#13 |
|
Coordinator
Join Date: Jan 2001
Location: Keene, NH
|
thanks Radii. what you did for me worked (in VS 2005 to boot).
__________________
Mile High Hockey |
|
|
|
|
|
#14 |
|
Coordinator
Join Date: Jan 2001
Location: Keene, NH
|
the totally awesome part is that it appears that some of the pages are compressed and some are not. I think someone is trying to tell me to scrap it.
__________________
Mile High Hockey |
|
|
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|