ScraperWiki – Tools for Scraping Web Data

There are some incredibly repositories of data available out on the web, but you need to tools to get that data into formats that are usable for your projects. ScraperWiki is a code repository, a community where programmers can share their code and enable others to refine and adapt it to their needs. The tool you need for your project might just be a click away!

Here is a video that describes a basic twitter scraper.

Taming chaotic library data – Google Refine

I am always on the lookout for new tools to help me conquer chaotic data. A colleague found an interesting Google project, Google Refine.

Google Refine

http://code.google.com/p/google-refine/

As librarians, we work with a wide variety of data sets (MARC, XML, excel, text files) from a variety of sources (Integrated Library Systems, vendor data, government statistical data, etc.). Manipulating data is a crucial skill, so I’m definitely going to play with this new tool.

Converting Excel Data to a Table

In my job I deal with lots of lists of things–occasionally I feel like a list wrangler.  Generally I want to look at my data in a variety of ways so the ability to easily sort and filter my lists without much effort is key to my daily life. One simple way to do so is to convert a simple excel list into a Table. Here’s how to take a simple Excel list of data and convert it into a sortable, filterable Table (Note: I’m using Excel 2007 here):

  1. Click anywhere in your Excel list.
  2. Go to Home tab, then the Styles group and click the Format as Table button. You’ll be presented with a variety of styling options…go ahead and pick your favorite.
  3. Excel will ask you to specify a range of cells to include in your data–it’s smart enough to make some guesses about what you might want based on what it sees. However, you’ll want to make sure you include all your data AND your column headers. If you have extraneous information at the beginning or end of your data (such as titles, explanatory material, etc.), you can edit the cell range so that they are not included. For example, say your list has a Title in row 1. You don’t want the title in your table, so you’ll need to change the data range so that it starts on row 2. It will look something like this “=$A$2$G$496″. (Excel presented me with this formula string…I simply changed $A$1 to $A$2.)  Now your table will encompass cells A2 through G496.
    • Note: if you have column headings, be sure My table has headers is checked off. If you don’t have column headings already Excel will create them for you, calling them “Column 1″, Column 2″, etc.
  4. That’s it! Now you can easily resort your table by clicking on the down arrow in the column you’d like to sort by. In addition, you can choose to filter the entire report by clicking on the down arrow and selecting or unselecting the values you’d like to see.

If you want to convert an existing Excel Table back to a normal range of cells you can do so by selecting any cell in the table and clicking the Convert to Range button on the Table Tools –> Design tab. All your data and formatting will be preserved.