Combine / Bar Data? [Archive] - Front Office Football Central

gstelmack

08-23-2012, 10:39 AM

I'm taking a machine learning class right now, and would love to experiment with some improvements to Draft Analyzer. To do this what I'd want is draft class data that we have now with Extractor, but combined with eventual peak bar profiles / cur-fut ratings. I could generate this with Extractor on a test league or two (or three), but that will take a while, and my attempts to automate it in the past have failed (I actually got pretty close with adding a "sim seasons" feature to Extractor folks could use to generate history, but had problems programmatically finding some of the dialog buttons that needed to be pressed to advance the season).

Do any of the leagues have this info available? I know Ben added season-by-season cur/fut tracking in the WOOF database, but the combines never got in there. IHOF has combines, but I'm not sure how much else. Does anyone have a dataset that matches this?

aston217

08-23-2012, 12:23 PM

is this the courserA machine learning class, by the way? that's cool stuff. Professor Ng is a great teacher.

I have some far-fetched dreams of creating my own database-to-HTML system that would include all the information you are looking for, so long as 1) all the draft class extractors are uploaded and 2) all the changetrackers are as well.

Probably months away from ever getting that done if it gets there, but I think that would give you what you're looking for. It'd put it all in its own auxiliary database in a table with the following columns: ID, FirstName, LastName, combines prTC-cur, prTc-fut, postTC-cur, postTC-fut, prFA-cur, prFA-fut, postFA-cur, postFA-fut (the last eight being arrays if that's possible). A few calculations would get peak data there.

While I'm at it though, may as well let me know if there's anything else you think would be a useful amalgamation of data.

Nemesis

08-23-2012, 07:47 PM

Hey greg, do you think this could be used to determine the weight of what stats the individual bars generate? Similar to jeffrey's study.

gstelmack

08-24-2012, 08:20 AM

is this the courserA machine learning class, by the way? that's cool stuff. Professor Ng is a great teacher.

Yes, that is the one. I've only done week 1, but yes he seems to be a fantastic teacher, and the course is set up pretty well. We'll see how I feel by week 10, but I am thoroughly enjoying it so far.

I know many of the database bits are there: WOOF has all the changetracker bits, for example, while IHOF has combine data, I'm just not sure anyone has both for any significant number of players. I'll generate it if I have to, if no one else has it already, but I thought I'd check first.

And yes, this could be used to figure out how FOF weights bars into cur/fut, but of course different GMs have different bars they look for (Ben's cur/fut formula for WRs is probably very different from Jim's :D )

Ben E Lou

08-24-2012, 08:42 AM

Greg, I'm pretty sure I have all the FOFroster.csv files and full bar data for the entire history of the CCFL (around 16 seasons).

I also have a fair bit for WOOF going back as well, but there are some discrepancies there because of duplicate names that were around for a while. The CCFL data would be much "cleaner."

thenewchuckd

08-24-2012, 10:32 AM

Just to add a bit of complexity. It is fairly clear that many players do not fully unmask. And it has even been suggested that unmasking has a random component, so some will unmask faster than others.

For example, it is starting to be generally accepted that a 80 rated corner who goes -10 in his 1st camp (unmasking, not vol) and ends up 55 rated is probably much worse than that. And actually, if you rerun various stages it becomes pretty clear that there is a random component to unmasking (although my personal feeling is that the random component is generally not huge).

If you combine players not unmasking fully with random unmasking, it has the potential to make this analysis much more difficult, if not impossible.

gstelmack

08-24-2012, 10:34 AM

gstelmack

08-24-2012, 10:37 AM

Just to add a bit of complexity. It is fairly clear that many players do not fully unmask. And it has even been suggested that unmasking has a random component, so some will unmask faster than others.

Understood. It will be a fun exercise anyway, and we'll see if it produces much usable. It won't be perfect, but where would the fun be in that? If it can add one more tool to the arsenal to help automate some of the draft filtering, I'm going to go for it, as I just don't have the time to pour over my draft lists like I used to.

thenewchuckd

08-25-2012, 08:05 AM

Something else that would help, if you can at all, would be to eliminate any players who had some obvious volatility effect. Just thinking about it yesterday and volatility is probably going to hamper your investigation more than anything.

The trouble with that is: for me it is becoming less and less clear which players had a vol effect and which are just unmasking. Yes, if a guy goes -30 or more in one training camp, that is clearly volatility. But what about plus or minus 10?

Also, I have had some players recently who were clearly going to be creepers but had camps of -2/+0 (current/future).

Anyhow, it is probably going to add some more noise but I am just saying if you could take away the clear volatility cases, it might help. But maybe it is not possible.

gstelmack

08-25-2012, 08:44 AM

It depends, I can feed volatility in as a feature as well, but yes throwing out guys that are than +/- 10, or putting them in their own set, would be interesting. However there are typically only a handful of guys with meaningful volatility bumps, so they can be considered just noise in the data.

The trickier part will be to deal with position switches, I may have to throw those out, or put them in their own data set: for a draftee RB, what might his future WR career be?.

thenewchuckd

08-25-2012, 09:13 AM

but yes throwing out guys that are than +/- 10,

A few thoughts about this.

-A rule of +/- 10 is tough. Some very heavily masked players will unmask more than that at training camp. Actually, for droppers -10 or more is actually quite common. And I have seen more than +10 often enough.
-I assume you are talking only initial training camp here. I say this because it is quite clear to me that volatility happens much more often than the first training camp. The thing with the non-TC movements are that they are often in the +6 to +10 range (or -6 to -10 range). Although exceptionally they can be much larger than that.
-It is difficult if not impossible to tell if a player is unmasking or experiencing vol, without knowing his full development history.
-Maybe this is what you mean - that a movement of 6 to 10 points in overall rating is not huge. But watch guys who have vol hits/gains. They are usually not equal across all bars. Some bars can have zero movement while others can move 20 points. If the goal is to measure individual bars vs combines, this can still be huge (this is another way to tell vol from unmasking, by the way, unmasking is never that extreme).
-These non-TC movements happen more often than some would think/suggest. I have seen the same player get hit more than once over his career, sometimes in opposite directions.

To sum up, do not underestimate volatility.

cuervo72

08-25-2012, 11:07 AM

If you need any additional files, I have a number of old ones from FOFL and IHOF too. I think they typically represent FA/pre-camp, post-camp, year end.

gstelmack

08-25-2012, 05:35 PM

If you need any additional files, I have a number of old ones from FOFL and IHOF too. I think they typically represent FA/pre-camp, post-camp, year end.

Any data set is good to me if:

- It is FOF2k7 generated.
- It includes pre-draft combines and bar ranges.
- It contains the peak development history for the player(s). I'll go with cur/fut if that's all we have, but bar profiles along the way would be a big help.

Mostly I'll be feeding in the pre-draft combines and bar profiles and the max cur/fut/bar profile for the players, and using them as a training data set to try and develop a prediction algorithm. So if the data set has those two things and is FOF2k7, I can use it. The more data, the better.

cuervo72

08-26-2012, 01:48 PM

Ok...I'll see what I can put together. I have combines for I believe all of FOFL and IHOF. That's all in one place as I stock a table with that info, basically:

year/class id/fname/lname/pos/b_rat/school/vol/sole/speed/strength/agility/jump/postest/dev/selected/player id

For ratings, what I have done for the sites is to run Extractor, then I run them through two scripts - one for overall ratings (all stored) and one for bar footprints (overwritten to show current data). The second set is basically what is in extractor, it just filters out some of the info and assigns the db ids. It's basically:

player/pos_name/pos_group/mentor/cur_total/fut_total/formations/cur_01/fut_01...cur_15/fut_15

I'm thinking the master combine list and the latter dumps could be parsed pretty easily to grab max bars, linking by ID.

thenewchuckd

08-27-2012, 04:00 PM

Ok, thinking about it further, I am wondering if this approach could be refined. If I can convince you of the error vol and unmasking will introduce, maybe this would be the better way to go.

Go the route that others have taken but applying your modelling approach. Generate your own draft classes, using the .csv file method. This way, you can have complete control over the bars and see the combines that come out.

The limitation here is you cannot control every bar but... I think it has more potential to deliver something useful.

gstelmack

08-28-2012, 10:57 AM

Go the route that others have taken but applying your modelling approach. Generate your own draft classes, using the .csv file method. This way, you can have complete control over the bars and see the combines that come out.

The limitation here is you cannot control every bar but... I think it has more potential to deliver something useful.

That's a good idea. Basically generate the results set that I expect in CSV, then import the draft class to get the feature set from the combines and maybe some other detail. Even if I just generate random values for the CSV, I ought to be able to run this fast enough to generate a ton of data quickly. Good idea.

gstelmack

10-13-2012, 07:01 PM

That's a good idea. Basically generate the results set that I expect in CSV, then import the draft class to get the feature set from the combines and maybe some other detail. Even if I just generate random values for the CSV, I ought to be able to run this fast enough to generate a ton of data quickly. Good idea.

This doesn't look like it's going to work as well as I had hoped, because the CSV imports for the draft file only have FOF2k4 bars, not some of the newer FOF2k7 bars. For example, there is a single Pass Rush rating for DEs, but it's not separated into Pass Rush Technique and Pass Rush strength. Plus there is no Play Diagnosis for defenders.

I'm going to have to think this through, I may be stuck doing an extract of a draft class, then looking at what everyone's bars are four years later, or looking at any data leagues already have for this.