Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2016/10/30

Net::Twitter Cookbook: Favorites and Followers

Favorites.

Also known as "Likes", they're an indication in Twitter that you approve of a status update. Most of the time, they're paired with retweets as signs by the audience to the author that the post is agreeable. Like digital applause.

This is all well and good, but it could be used for so much more, if you had more access and control over them.

So I did.

The first step is to collect them. There's an API to get them, and collecting them in bulk is easy. A problem is avoiding grabbing the same tweet twice.

# as before, the "boilerplate" can be found elsewhere in my blog.
use IO::Interactive qw{ interactive } ;
my $config ;
$config->{start} = 0 ;
$config->{end}   = 200 ;

for ( my $page = $config->{start}; $page <= $config->{end}; ++$page ) {
        say {interactive} qq{\tPAGE $page} ;
        my $r = $twit->favorites( { 
            page => $page ,
            count => 200 ,
            } ) ;
        last unless @$r ;

        # push @favs , @$r ;
        for my $fav (@$r) {
            if ( $config->{verbose} ) { 
                 say {interactive} handle_date( $fav->{created_at} ) 
                 }
            store_tweet( $config->{user}, $fav ) ;
            }
        sleep 60 * 3 ;    # five minutes
        }


Once I had a list of my tweets, one of the first things I did was use them to do "Follow Friday". If you know who you favorited over the last week, it's an easy thing to get a list of the usernames, count them and add them until you have reached the end of the list or 140 characters.

Then, as I started playing with APIs and wanted to write my own ones, I created an API to find ones containing a substring, like json or pizza or sleep. This way, I could begin to use a "favorite" as a bookmark.

(I won't show demo code, because I'm not happy or proud of the the code, which lives in a trailing-edge environment, and because it's more database-related than Twitter-focused.)

As an aside, I do not follow back. There are people who follow me who I have no interest in reading, and there are people I follow who care nothing about my output. In general, I treat Twitter as something between a large IRC client and an RSS reader, and I never expected nor wanted RSS feeds to track me.

But this can be a thing worth tracking, which you can do, without any storage, with the help of a list. Start with getting a list of those following you, those you follow, and the list of accounts (I almost wrote "people", but that isn't guaranteed) in your follows-me list. If they follow you and aren't in your list, add them. If they're in the list and you have started following them, take them out. If they're on the list and aren't following you, drop them. As long as you're not big-time (Twitter limits lists to 500 accounts), that should be enough to keep a Twitter list of accounts you're not following.

use List::Compare ;

    my $list = 'id num of your Twitter list';

    my $followers = $twit->followers_ids() ;
    my @followers = @{ $followers->{ids} } ;

    my $friends = $twit->friends_ids() ;
    my @friends = @{ $friends->{ids} } ;

    my @list = get_list_members( $twit, $list ) ;
    my %list = map { $_ => 1 } @list ;


    my $lc1 = List::Compare->new( \@friends,   \@followers ) ;
    my $lc2 = List::Compare->new( \@friends,   \@list ) ;
    my $lc3 = List::Compare->new( \@followers, \@list ) ;

    # if follows me and I don't follow, put in the list
    say {interactive} 'FOLLOWING ME' ;
    for my $id ( $lc1->get_complement ) {
        next if $list{$id} ;
        add_to_list( $twit, $list, $id ) ;
        }

    # if I follow, take off the list
    say {interactive} 'I FOLLOW' ;
    for my $id ( $lc2->get_intersection ) {
        drop_from_list( $twit, $list, $id ) ;
        }

    # if no longer following me, take off the list
    say {interactive} 'NOT FOLLOWING' ;
    for my $id ( $lc3->get_complement ) {
        drop_from_list( $twit, $list, $id ) ;
        }

#========= ========= ========= ========= ========= ========= =========
sub add_to_list {
    my ( $twit, $list, $id ) = @_ ;
    say STDERR qq{ADDING $id} ;
    eval { $twit->add_list_member(
            { list_id => $list, user_id => $id, } ) ; } ;
    if ($@) {
        warn $@->error ;
        }
    }

#========= ========= ========= ========= ========= ========= =========
sub drop_from_list {
    my ( $twit, $list, $id ) = @_ ;
    say STDERR qq{REMOVING $id} ;
    eval {
        $twit->delete_list_member( { list_id => $list, user_id => $id, } ) ;
        } ;
    if ($@) {
        warn $@->error ;
        }
    }



But are there any you should follow? Are there any posts in the the feed that you might "like"? What do you "like" anyway?

There's a way for us to get an idea of what you would like, which is your past likes. First, we must get, for comparison, a collection of what your Twitter feed is like normally. (I grab 200 posts an hour and store them. This looks and works exactly like my "grab favorites code", except I don't loop it.

    my $timeline = $twit->home_timeline( { count => 200 } ) ;

    for my $tweet (@$timeline) {
        my $id          = $tweet->{id} ;                          # twitter_id
        my $text        = $tweet->{text} ;                        # text
        my $created     = handle_date( $tweet->{created_at} ) ;   # created
        my $screen_name = $tweet->{user}->{screen_name} ;         # user id
        if ( $config->{verbose} ) {
            say {interactive} handle_date( $tweet->{created_at} );
            say {interactive} $text ;
            say {interactive} $created ;
            say {interactive} $screen_name ;
            say {interactive} '' ;
            }
        store_tweet( $config->{user}, $tweet ) ;
        # exit ;
        }


So, we have a body of tweets that you like, and a body of tweets that are a representative sample of what Twitter looks like to you. On to Algorithm::NaiveBayes!

use Algorithm::NaiveBayes ;
use IO::Interactive qw{ interactive } ;
use String::Tokenizer ;

my $list   = 'ID of your list';
my $nb     = train() ;
my @top    = read_list( $config, $nb , $list ) ;

say join ' ' , (scalar @top ), 'tweets' ;

for my $tweet (
    sort { $a->{analysis}->{fave} <=> $b->{analysis}->{fave} } @top ) {
    my $fav = int $tweet->{analysis}->{fave} * 100 ;
    say $tweet->{text} ;
    say $tweet->{user}->{screen_name} ;
    say $tweet->{gen_url} ;
    say $fav ;
    say '' ;
    }

exit ;

#========= ========= ========= ========= ========= ========= =========
# gets the first page of your Twitter timeline.
#
# avoids checking a tweet if it's 1) from you (you like yourself;
#   we get it) and 2) if it doesn't give enough tokens to make a
#   prediction.
sub read_list {
    my $config = shift ;
    my $nb     = shift ;
    my $list   = shift ;

    ...

    my @favorites ;
    my $timeline =     $twit->list_statuses({list_id => $list});

    for my $tweet (@$timeline) {
        my $id          = $tweet->{id} ;                          # twitter_id
        my $text        = $tweet->{text} ;                        # text
        my $created     = handle_date( $tweet->{created_at} ) ;   # created
        my $screen_name = $tweet->{user}->{screen_name} ;         # user id
        my $check       = toke( lc $text ) ;
        next if lc $screen_name eq lc $config->{user} ;
        next if !scalar keys %{ $check->{attributes} } ;
        my $r = $nb->predict( attributes => $check->{attributes} ) ;
        my $fav = int $r->{fave} * 100 ;
        next if $fav < $config->{limit} ;
        my $url = join '/', 'http:', '', 'twitter.com', $screen_name,
            'status', $id ;
        $tweet->{analysis} = $r ;
        $tweet->{gen_url}  = $url ;
        push @favorites, $tweet ;
        }

    return @favorites ;
    }

#========= ========= ========= ========= ========= ========= =========
sub train {

    my $nb = Algorithm::NaiveBayes->new( purge => 1 ) ;
    my $path = '/home/jacoby/.nb_twitter' ;

    # adapted on suggestion from Ken to

    # gets all tweets in your baseline table
    my $baseline = get_all() ;
    for my $entry (@$baseline) {
        my ( $tweet, $month, $year ) = (@$entry) ;
        my $label = join '', $year, ( sprintf '%02d', $month ) ;
        my $ham = toke(lc $tweet) ;
        next unless scalar keys %$ham ;
        $nb->add_instance(
            attributes => $ham->{attributes},
            label      => ['base'],
            ) ;
        }

    gets all tweets in your favorites table
    my $favorites = get_favorites() ;
    for my $entry (@$favorites) {
        my ( $tweet, $month, $year ) = (@$entry) ;
        my $label = join '', $year, ( sprintf '%02d', $month ) ;
        my $ham = toke(lc $tweet) ;
        next unless scalar keys %$ham ;
        $nb->add_instance(
            attributes => $ham->{attributes},
            label      => ['fave'],
            ) ;
        }

    $nb->train() ;
    return $nb ;
    }

#========= ========= ========= ========= ========= ========= =========
# tokenizes a tweet by breaking it into characters, removing URLs
# and short words
sub toke {
    my $tweet = shift ;
    my $ham ;
    my $tokenizer = String::Tokenizer->new() ;
    $tweet =~ s{https?://\S+}{}g ;
    $tokenizer->tokenize($tweet) ;

    for my $t ( $tokenizer->getTokens() ) {
        $t =~ s{\W}{}g ;
        next if length $t < 4 ;
        next if $t !~ /\D/ ;
        my @x = $tweet =~ m{($t)}gmix ;
        $ham->{attributes}{$t} = scalar @x ;
        }
    return $ham ;
    }


Honestly, String::Tokenizer is probably a bit too overkill for this, but I'll go with it for now. It might be better to get a list of the 100 or 500 most common words and exclude them from the tweets, instead of limiting by size. As is, strings like ada and sql would be excluded. But it's good for now.

We get a list of tweets including a number between 0 and 1, representing the likelihood, by Bayes, that I would like the tweet. In the end, it's turned into an integer between 0 and 100. You can also run this against your normal timeline to pull out tweets you would've liked but missed. I often do this

I run the follows_me version on occasion. So far, it is clear to me that the people I don't follow, I don't follow for a reason, and that remains valid.

If you use this and find value in it, please tell me below. Thanks and good coding.

No comments:

Post a Comment