RStudio is THE standard for exploratory data analysis on large data sets
November 29, 2018

RStudio is THE standard for exploratory data analysis on large data sets

Leah Jakaitis | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User

Overall Satisfaction with RStudio

RStudio is used as a an R development environment for cleaning, manipulating, and analyzing large data sets. It is used in conjunction with Python for data science tasks. RStudio is used across the entire organization as a complement to other technologies and to support data science and analysis projects. In my role, I gather large data sets (>500,000 or million rows) from different platforms, and rely on RStudio to prepare data for further analysis. It's an excellent platform for conducting preliminary / exploratory data analysis: to get an understanding of trends and behaviors exhibited by the data set, and to guide later analytic decisions.
  • Create and manipulate data frames: syntax is intuitive, terminal lets you see results / behaviors immediately.
  • Visualization (especially using shiny or other visualization packages): so many different kinds of graphs and viz available.
  • Sharing results and community documentation: extensive information is available on use and applications of different packages, making RStudio (and R) very versatile for a variety of analysis projects.
  • R has a fairly steep learning curve and can be intimidating for new users. RStudio's package, swirl, is useful as an introductory tutorial for use and capabilities, but it is limited.
  • RStudio sometimes has stability problems when it comes to working with very large / big data sets. This is because RStudio relies on the computer's memory to process the data. A quick calculation can be used to determine if the data set's size exceeds the computer's memory capabilities, though.
  • Quickly analyze data to determine validity, and if further exploration is needed (basically as a triage to assess data trends/behavior/usefulness).
  • Code can be re-used and redeployed to save time and improve organization efficiency.
RStudio works similarly to PyCharm (and PyCharm can support R code) insofar as it's a development environment meant to improve the coding experience and easily provide commonly used resources (packages). They both provide a navigable dev environment with some learning curve. RStudio is more bare-bones, though: it has fewer bells and whistles (like night mode, extensive additional language support, etc). I usually select RStudio if I'm just doing a basic internal analysis on data, because it's what I'm most familiar with and is usually the easiest to re-deploy for analyzing other sets of data.
RStudio is well suited for ingesting and analyzing large data sets in a variety of formats, including CSV files. A large number of packages are supported to enable all kinds of projects: time series analysis, visualization, table-building, advanced statistical analysis are all examples of RStudio's application. There is exhaustive community documentation available online about how and when to deploy different packages (and their functions), and also how to troubleshoot different issues users may run into.

For more extensive analysis and polished visualization, Python is generally the recommended language. It's also where the industry (data science, data analysis, etc) is heading overall. R is still extensively used in-field, and is a standard part of a statistics curriculum in academia.

Using RStudio

ProsCons
Well integrated
Consistent
Convenient
Feel confident using
Unnecessarily complex
Difficult to use
Slow to learn
Lots to learn
  • Ingesting data from common file types (CSV, XLSX).
  • Performing basic visualization or analysis.
  • swirl - can't recommend the built-in tutorials enough!