data science for big data analytics - essentials

By attending Data Science for Big Data Analytics - Essentials workshop, Participants will:

  • Harness data mining methods to answer crucial business questions from internal and external data sources
  • Create competitive advantage from both structured and unstructured data
  • Predict outcomes with supervised machine learning techniques
  • Unearth patterns in customer behavior with unsupervised techniques
  • Work with R and RHadoop to analyze structured, unstructured and Big Data

Big Data Analytics allow organizations to build competitive strategies around data-driven insights and derive value from vast amounts of untapped data. Whether you are tracking the efficiency of a warehouse or predicting how and when to modify staffing levels in a call center, this Data Science for Big Data Analytics - Essentials training course provides the knowledge and skills required to reach the next level of decision-making maturity.

Data Science for Big Data Analytics - Essentials class is intended for managers, data and business analysts, database professionals and others involved in forecasting and trends management. Programming and a background in statistics is helpful, but not required.

COURSE AGENDA

  • Mining unstructured data for business applications
    • Preprocessing unstructured data in preparation for deeper analysis
    • Describing a corpus of documents with a term-document matrix
  • Coping with the additional complexities of Big Data
    • Examining the MapReduce and Hadoop architectures
    • Integrating R and Hadoop with RHadoop
  • Exploratory Data Analysis with R
    • Loading, querying and manipulating data in R
    • Cleaning raw data for modeling
    • Reducing dimensions with Principal Component Analysis
    • Extending R with user-defined packages
  • Facilitating good analytical thinking with data visualization
    • Investigating characteristics of a data set through visualization
    • Charting data distributions with boxplots, histograms and density plots
    • Identifying outliers in data
  • Estimating future values with linear and logistic regression
    • Modeling the relationship between an output variable and several input variables
    • Correctly interpreting coefficients of continuous and categorical data
  • Regression techniques for dealing with Big Data
    • Overcoming issues of volume with RHadoop
    • Creating regression modules for RHadoop
  • Automating the labeling of new data items
    • Predicting target values using Decision Trees
    • Applying probabilistic methods to predict outcomes with Naive Bayes
    • Combining tree predictors with random forests in RHadoop
  • Assessing model performance
    • Visualizing model performance with a ROC curve
    • Evaluating classifiers with confusion matrices
  • Identifying previously unknown groupings within a data set
    • Segmenting the customer market with the K-Means algorithm
    • Defining similarity with appropriate distance measures
    • Constructing tree-like clusters with hierarchical clustering
    • Clustering text documents and tweets to aid understanding
  • Discovering connections with Link Analysis
    • Capturing important connections with Social Network Analysis
    • Exploring how social networks results are used in marketing
  • Building and evaluating association rules
    • Capturing true customer preferences in transaction data to enhance customer experience
    • Calculating support, confidence and lift to distinguish “good” rules from “bad” rules
    • Differentiating actionable, trivial and inexplicable rules
    • Meeting the challenge of large data sets when searching for rules with RHadoop
  • Constructing recommendation engines
    • Cross-selling, upselling and substitution as motivations
    • Leveraging recommendations based on collaborative filtering
  • Expanding analytic capabilities
    • Breaking down Big Data Analytics into manageable steps
    • Integrating analytics into current business processes
    • Reviewing Spark, MLib and Mahout for machine learning
  • Dissemination and Big Data policies
    • Examining ethical questions of privacy in Big Data
    • Disseminating results to different types of stakeholders