top of page

Machine Learning

Public·3 members

Data Analysis Assignment Help with Spark In Python

Codersarts is a top rated website for students which is looking for online Data Analytics Assignment Help, Homework help, Coursework Help in Apcahce Spark, Pyspark, Mlib, tweepy others library and tools to students at all levels whether it is school, college and university level Coursework Help or Real time project. Hire us and Get your projects done by Data Analytics expert


There are two common Data Analytics over Social media data

  • Machine Learning Algorithms: Apply classification to Tweets

  • Real time analysis of Tweets: Spark Streaming Library



Data Analysis


Recommendation Models:

  • Content-based filtering

  • Collaborative filtering

  • Matrix factorization

  • Alternating least squares


Classification Models:

  • Linear models

  • Logistic regression

  • Support vector machines (SVM)

  • Decision trees

  • Naïve Bayes


Clustering Models:

  • k-means clustering

  • Hierarchical clustering

  • Kohonen node


Classification


Classification is a form of supervised learning where we train a model with

training examples


Can be used for:

  • Predicting the probability of Internet users clicking on an online advert; here, the classes are binary in nature (that is, click or no click)

  • Classifying images, video or sounds

  • Assigning categories or tags to news articles, web pages, tweets (multiclass)

  • Discovering e-mail and web spam (binary)

  • Ranking customers or users in order of probability that they might purchase a product or use a service

  • Predicting customers or users who might stop using a product, service or provider (called churn)

  • And other cases


Clustering


Clustering models is a form of unsupervised learning where each training

example is assigned to a segment called a cluster


Can be used for:

  • Segmenting users or customers into different groups based on behavior characteristics and metadata

  • Grouping content on a website or products in a retail business

  • Segmenting communities in social media networks

  • Topic clustering of Tweets


K-means clustering approach


Clustering is the process of grouping a set of objects into

classes of similar objects:

  • Documents within a cluster should be similar.

  • Documents from different clusters should be dissimilar.


In principle, optimal partition achieved via minimising the sum of

squared distance to its “representative object” in each cluster


Historical Data Analysis with Mllib (MLDataAnalysis.scala):

  • Data Representation

  • Clustering tweets by text

  • Classification of tweets by sentiment (negative, positive,etc.)

  • Result visualization in Zepplin


Streaming Data Analysis (CollectingTweetsToFile.scala, CollectingTweetsToMongoDB.scala, witterStreamingAnalyzer.scala):

  • Stream tweets in json file

  • Stream tweets to MongoDB

43 Views
bottom of page