You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this use-case we show you how you can build your own sentiment analysis classifier for stock news - completely from stratch! You'll scrape some interesting stock news from the internet to create your own dataset and then use Kern refinery to easily and quickly label the data.
You can import the snapshot.json.zip on the start screen of the application (https://localhost:4455)
Labels
The goal of the sentiment classifier is to predict if the headline for a stock news headline is Positive, Neutral or Negative.
Heuristics
We can start building some function that detects whether a Headline contains a regular expression containing something like "up by 20%", which is often times the case for stock data. We'll use this to build the following labeling functions:
See here for an explaination of the regex function. By the way, we could also have implemented this as two separate functions if we'd like to.
Next, we know that there are some key terms that might occur during labeling. One very simple instance is the occurence of famous investor Warren Buffet, which usually is written in a Neutral way. The simplest form to implement this would look as follows:
Alternatively, we could implement this as a distant supervisor which looks up famous investors. This would look something like this:
fromknowledgeimportfamous_investors# we'd need to create a lookup list in the app for thisdefcontains_investor(record):
forinvestorinfamous_investors:
ifinvestor.lower() inrecord['Headline'].text.lower():
return'Neutral'
Next, we're going to do something that really almost always just helps to boost your labeling: integrating an active learner. We've created embeddings using the transformer zhayunduo/roberta-base-stocktwits-finetuned from 🤗 Hugging Face, so we can now implement the following:
fromsklearn.linear_modelimportLogisticRegression# you can find further models here: https://scikit-learn.org/stable/supervised_learning.html#supervised-learningclassMyActiveLearner(LearningClassifier):
def__init__(self):
self.model=LogisticRegression()
@params_fit(embedding_name="Headline-classification-zhayunduo/roberta-base-stocktwits-finetuned", # pick this from the options abovetrain_test_split=0.5# we currently have this fixed, but you'll soon be able to specify this individually! )deffit(self, embeddings, labels):
self.model.fit(embeddings, labels)
@params_inference(min_confidence=0.0, # we want every prediction, but we could also increase the minimum required confidencelabel_names=None# you can specify a list to filter the predictions (e.g. ["label-a", "label-b"]) )defpredict_proba(self, embeddings):
returnself.model.predict_proba(embeddings)
And that's it; from here, we can create a first version and build a simple classificator, e.g. via automl-docker. Also, you can continue to build heuristics and doing so improve your label quantity and quality.
If you like what we're working on, please leave a ⭐ for refinery!
About
Containing examples of projects you can use to test refinery. Please select the use case from the branches.