Scraping the CSP Mysam website for my soccer schedule
One of the first things I had to do, when starting this project, was scrape the information from the page. Scraping, in this context, means simply identifying what information I want, where it’s stored, and then collecting it together in some fashion. Identifying where the code is stored can sometimes be the most difficult part of the scraping process. How long this takes is highly dependent on how well the site is put together.
Luckily the CSP website was fairly well put together, so it wasn’t that difficult to do. First off I created a function to house all this information. I called it collectschedule(), as is appropriate.
function collectschedule (){
var name = $j(".header")[0].innerText;
var teamname = name.slice(5)
var scheduletable = $j(".table-data")[0];
var totalrows = scheduletable.children[0].childElementCount;
var finalschedule = [];
var game = 1;
for (var rownumber = 1; rownumber < totalrows; rownumber += 2){
var schedule = {};
var tablerows = scheduletable['children'][0];
var tablerow = tablerows.children[rownumber];
var datetime = tablerow.children[0].innerText;
var comp = tablerow.children[2].innerText;
var tshirt = tablerow.children[2].innerHTML;
var tshirtcolor = "- Home";
if (tshirt.indexOf("<") != 0){
var tshirtcolor = "- Away"
};
if (tshirtcolor == "- Away"){
comp = comp.slice(3)
};
datetime = adjustdate(datetime);
schedule['game'] = game;
schedule['datetime'] = datetime;
schedule['competitor'] = comp;
schedule['tshirtcolor'] = tshirtcolor;
schedule['teamname'] = teamname;
finalschedule.push(schedule);
game = game + 1;
};
console.log( finalschedule );
return finalschedule;
};
This functions starts out by setting a bunch of variables. The first one, name, is the first element in the list of elements with class header, and for programming that means position zero. The position is designated by the zero in brackets. Class is designated by the ‘.’ and as with all javascript the variable is first indicated with the var in the beginning. The variable name equates to the first line ‘Team *team name*’. The second variable is that same line, but sliced apart so it only includes *team name*. Then I create a variable that identifies the table where the schedule is included (scheduletable), the total rows in the table (tablerows), an empty list called finalschedule and a base variable called game. The last variable game, is used to identify which game of the season is being played.
The most important data to collect is what is up next, the actual schedule. The schedule is in a table, with each table’s section filled with the appropriate information. In order to collect this information correctly, I made a for loop that would go through each line in the table and collect the necessary data. Each line’s data is then put into it’s own dictionary. Then, after each line is collected, that small dictionary is added to the empty list finalschedule.
Up next time: A continuation of my for loop in the collectschedule() function
 
       2012 FlamingLunchbox unless otherwise noted.
 2012 FlamingLunchbox unless otherwise noted.