Data Visualization

I have worked with a variety of tools to collect, analyze and display data including Python, d3.js, NetworkX, Gephi, NLTK, scikit-learn, MongoDB, SQLite, Neo4J.

Information Architecture graph

This visualization, created with Gephi, was an attempt to understand something of the information architecture of a very large website, in this case nasa.gov. Node size corresponds to the number pages which link to that page and for the sake of clarity, only nodes which have more than four incoming links are shown on this visualization.

To gather the data I wrote a script in Python that crawled the site to a specified depth and used NetworkX to create a network graph of the site where nodes are the webpages and edges are links between pages.

Headline count

I used d3.js to create this interactive graph of the number of New York Times headlines referencing the word "ebola" over a 60-day period. Mousing over the data points reveals a tool tip with the date and the article count for that date. A live version which includes the d3 code and data is hosted on bl.ocks.org.

I collected the data using a Python script which queries the New York Times Article Search API and writes the data to a csv file.