It is probably safe to say that python and R are here to stay, but SQL is on another level. SQLĪs I mentioned, I am hesitant to recommend a specific tool or technology because of how quickly the landscape changes. You can also follow the TidyTuesday project, even if you’re not an R user, to find interesting datasets and get familiar with the types of cleaning steps that occur in the wild. On the other hand, government websites such as or are a great place to find datasets that are messy. In general, sites like or zindi are not the best places to practice data cleaning because they are focused on data science, and the datasets are usually pretty clean already. As I walked my management through the results, one of them insisted that the numbers “didn’t make sense.” When I did more research, I discovered that the calls data had a column called “status” that sometimes was populated with an “X.” Apparently, the system would record an “X” for a test record, which should be ignored.ĭata cleaning skills grow with hands-on practice and business expertise. Early in my career, I was tasked with delivering the percentage of support calls that were resolved on the first try. Whatever the actual percentage, the truth is that a large proportion of time spent with data, in general, is spent cleaning it.Ĭleaning data is important, because uncleaned data can produce misleading patterns and lead to mistaken conclusions. Data Cleaningĭata Scientists are known for their claim that “80% of building a machine-learning model is preparing and cleaning the data.” However, this is also true for data analysts. The key is having your brain filled with possibilities, so you can imagine what you want before you build it. Some of you will fall in love with R or python, and others with Excel or Tableau. The mechanics of how you will actually create the visualization will come with practice. I keep both of his books, Data Points and Visualize This, on my bookshelf for inspiration. Edward Tufte is the grandfather of data visualization, and I’m also a fan of Nathan Yau over at, where you can find books, blogs, courses, and tutorials. I also recommend buying a couple of books. Dedicate 15 minutes weekly to browsing the blogs to fill your brain with possibilities. Build a bookmarks folder called “Inspiration” and fill it with blog links. Therefore, make a habit of exploring other people’s work.
However, I’m hesitant to recommend specific technology or course because the world changes so quickly, and I want to offer advice that stands the test of time. So how do you improve this skill? Well, there are plenty of training options in the form of online courses and plenty of tools specializing in visualization. By building visualizations, you can help your company’s decision-makers understand complex ideas at a glance. Therefore, graphics and other visual representations are important because it creates a better understanding, even for those who aren’t trained in analyzing datasets. Data VisualizationĪs an analyst, you will probably spend at least 10 times as much time analyzing the data as you will have to present the information to your audience. Therefore, what I can offer you is advice that stands the test of time. Although some people politely refer to me as “mature” or “experienced,” the truth is that my perspective is shaped by the fact that I’ve been around long enough to remember when neon colors and gradients on bar-charts were cool. Well, as it turns out, I’m a lot older than the average Medium author. So, what could I offer that hasn’t already been said? Let me start off by acknowledging that this subject has been written about and discussed repeatedly on many different platforms. Photo by Simon Abrams on Unsplash Introduction