![]() We continue by calling our function, so we copy def get_emojis(data, column) and paste it at the bottom. To document our function, we’ll write # our emoji function at the top. We then paste our function in the Python Script one space below the libraries. ![]() Now we go back to our Jupyter notebook and copy our function highlighted in the image below. So we bring in the libraries or codes we copied from our notebook and place them here. Next, we go to the Run Python Script from the Transform tab, and you can see it “says dataset holds the variable,” which was done by pandas. So we highlight the codes we need and press Ctrl + V to copy them. ![]() After that, we go back to our notebook to get our code. So let’s open up Power BI and go to Transform Data using our WallStreetBets data set. Thus, it becomes bow = “”.join(df), like in the image below. This function has two arguments: we need data and a column.Īnd because of the second argument, we should change the body inside bow = ” “.join(df) into column, as we want to make sure that’s passed in and not just the body. Now we can have those repeatable steps and run the function highlighted above over any of our existing columns. So far, we have created our bag of words and tokens, searched for emojis, counted those emojis, and created a table with the data frame. Creating A FunctionĪfter creating the lines we did independently, we want to create a function, which is nothing more than repeatable code. You can see that the diamond (diamond hands was a popular phrase for WallStreetBets) and the rocket are the top emojis used. Thus we counted each one of the items in that list and ended up with these emojis below, highlighted in the Out section. We turn it into a bag of words, then a list using tokenize, and finally count it using the counter. That’s because we’re applying the data frame function (highlighted in the In section below), which counts each emoji in a column and a row. Then we create a counter (highlighted in the In section below) of those different emojis resulting in a data frame. We’ll then do a list comprehension by saying that for each element in the list, if it’s in our tokens and it is an emoji, we want to return only the emoji as highlighted below. So we’ll eliminate this variable because we don’t need it by clicking on the scissors icon at the top left, highlighted in the previous image above. Now we have words in a list indicated by brackets which we can review and evaluate one by one to see if it’s an emoji. We can see from the results above that we no longer have the collection of words that we had when we joined those 10 sections. And because we want to tokenize it, we’ll type word_tokenize(a) at the bottom and click Run. Let’s go back to the line we made earlier and call that variable a, so we put a = at the beginning. The next step is to turn those words into tokens. But be sure to delete the Bow = part in the code because we already have that variable in the next line.Īfter that we click Run and what we get is the 10 body sections that are now in words. So let’s create another line and enter the code highlighted in the image below to join the first 10 body sections in our data set. Next, we’ll pre-process this data set into a bag of words without looking at each section. If it has na and is false, as highlighted above, we want to keep it. We’ll look at body sections that don’t have a comment. But we need to eliminate these rows and create what we call a bag of words. This data set will allow us to evaluate all the different rows inside it. Our data set is WallStreetBets in Reddit, and you can see below it has a head, title, id, and body section. We’ll also import Counter for counting and executing frequency analysis on our emojis. So we’ll bring in word tokenize, which allows us to look at each word and evaluate it from the Collections library. Next, we’ll use the natural language toolkit and conduct deep-level text analysis. If you don’t have these libraries in your Python environment, you need to install them before we can proceed. The first is pandas, our data manipulation library, and the second is emoji, which allows us to identify and decode emojis. Let’s go to the top of the notebook and bring in our needed libraries. So let’s review the code and each essential step in applying it to Power BI. The image above are the emojis we’ll use in this tutorial. We want to end up with a code we can easily apply to different data sets and environments, such as the Python Script Editor in Power BI.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |