The Blogs of my interest are as follows:  Datasets for Data Science Projects

If you’ve ever worked on a personal data science project, you’ve probably spent a lot of time browsing the internet looking for interesting data sets to analyze. It can be fun to sift through dozens of data sets to find the perfect one. But it can also be frustrating to download and import several csv files, only to realize that the data isn’t that interesting after all. Luckily, there are online repositories that curate data sets and (mostly) remove the uninteresting ones.

In this post, we’ll walk through several types of data science projects, including data visualization projects, data cleaning projects, and machine learning projects, and identify good places to find data sets for each. Whether you want to strengthen your data science portfolio by showing that you can visualize data well, or you have a spare few hours and want to practice your machine learning skills, we’ve got you covered.

 

I am an experienced software engineer with experience in both backend and front-end software development. But in past few years I have turned my focus to front-end development and taking up leadership roles. 

 

In this post, we’ll walk through several types of data science projects, including data visualization projects, data cleaning projects, and machine learning projects, and identify good places to find data sets for each. Whether you want to strengthen your data science portfolio by showing that you can visualize data well, or you have a spare few hours and want to practice your machine learning skills, we’ve got you covered.

Datasets for Data Visualization Projects

A typical data visualization project might be something along the lines of “I want to make an infographic about how income varies across the different states in the US”. There are a few considerations to keep in mind when looking for good data for a data visualization project:

  • It shouldn’t be messy, because you don’t want to spend a lot of time cleaning data.

  • It should be nuanced and interesting enough to make charts about.

  • Ideally, each column should be well-explained, so the visualization is accurate.

  • The data set shouldn’t have too many rows or columns, so it’s easy to work with.

Datasets for Data Processing Projects

Sometimes you just want to work with a large data set. The end result doesn’t matter as much as the process of reading in and analyzing the data. You might use tools like Spark (which you can learn in our Spark course) or Hadoop to distribute the processing across multiple nodes. Things to keep in mind when looking for a good data processing data set:

  • The cleaner the data, the better — cleaning a large data set can be very time consuming.

  • There should be an interesting question that can be answered with the data.

Cloud hosting providers like Amazon and Google are good places to find big data sets. They have an incentive to host data, because they can make you analyze that data using their infrastructure (and thus pay them).

Coding Horror:

http://www.codinghorror.com/blog/

 

http://stackoverflow.com/

 

- Greg's Cool Thing of the Day : 

http://coolthingoftheday.blogspot.com/

 

 

© 2020 ~ Designed and Developed by PATRICE CONNOLLY a professional who will proudly represent your efforts...