Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2011/03/28

Notes on Indiana Linuxfest

Last Friday, Saturday and Sunday were Indiana Linuxfest, and I had the opportunity to attend Saturday. I got there in time to catch most of the opening keynote (9am) so I was a bit too wiped to really do the party and see dual core, which saddened me, but I had to be not-sucky on Sunday.

Here are some notes:

  • The flyer should have a venue map so you know right off where you're going. The talks should be organized by day, and if there's going to be a presenter bio, the talk should have the presenter's name on it, so you know who is giving what talk.
  • First presentation I saw was on Open Hardware, and I came in late. This sucks, because it was cool. Talked some about the Arduino, which I really need to get, because they're cool. Also mentioned were Beagle Boards (I think), which is cool because they're computers the size of a 3.5" floppy disc. Also, the EZ430, a programmable watch from Texas Instruments. This is everything I thought the programmable Timex Iron Man watch was going to be. His last line was "We live in the future; We might as well start taking advantage of it." A man after my own heart.
  • Dan Klein, a Googler from Pittsburgh, gave a talk called Frank Lloyd Wright was Right. In this, he told a story about his hometown, where Wright was asked to suggest a plan to fix some of the architecture problems with Pittsburgh. His advice? "Raze it to the ground and start over." Don't just patch and make due with incremental change, but start over and do it right this time. The presenter suggested this should be the way to take on many of the problems with computing infrastructure, as well.
  • Brian Proffitt talked about "The Death of the Linux Desktop (and I feel fine)". Great presentation by a great presenter, with good discussion.
  • The Bloomington Hackerspace, BloomingLabs, set up shop at the 'Fest, and that was cool. Hands-on is always cool. I got to pick my first padlock! Yay!
I could go on, but I won't. But I hope they consider this enough of a win to do it again next year, and and I hope and plan to make Ohio LinuxFest in September.

On The Importance of Unique Strings

I find generalized mail notices to be counterproductive. I get a lot of mail, from many dozens of mailing lists, from many dozens of commercial ventures I occasionally throw money at or want to throw money at, and more. I don't pay attention to it always. I write things to check things and send me email, and even then, I don't often read it. So why would I want to have every little thing beep and pop up and eat my attention when it arrives? Especially when, as a programmer, a lot of problem solving just goes away when that sort of beep draws me out?

So, a while ago, I wrote jBiff, which announces when I have mail (the traditional roll of biff and xbiff) via Jabber/XMPP (thus the "J"). I wrote it so I could configure search strings, so that I could have it biff when I got mail from:

  • My boss and coworkers - because I can name no worse feeling than hearing my boss come up behind me and asking "Did you get my email?"
  • My wife and kids
  • My parents
  • A select set of friends
  • The office voice mail
  • My bank - which tells me when I have run out of money
  • The Weather Channel - which tells me when there's severe weather in my area. (I work in a sub-basement so I can't just look out the window.)
And it's pretty much in that order. If you're top of that list, I'll get warned within two minutes of receiving the mail, and the bottom get checked every twenty minutes, all thanks to crontab.

So, I was surprised at getting alerted that a boutique amp manufacturer has given away a 2x12" cabinet. I'm not remotely surprised at receiving this email (and no, I didn't win) but I am surprised that it popped up on Google Talk. And it took a little poking.

My eldest son is named "Niel". Not "Neil" or "Neal", but "Niel", as my grandfather was named "Niels" and I wanted to honor my history. And ensure that nobody's going to spell either his first name correctly on first pass, to go with nobody ever being able to pronounce his last name. I can be cruel. 

And the representative from the boutique amp manufacturer is named "Daniel".

Daniel.

Because I used my son's first name, not his whole email name, I got alerted. I have since fixed it, but as a reminder to both myself and others, I'm putting this out there: Don't match a short string when you can match a longer, more unique string, or else you'll get false positives.

2011/03/24

Now There IS An App For That

I had to do something. This was something. So I did this.

This program collects the functions by module in a directory full of modules and checks a code base against it. This tells you which functions you are actually using. Which isn't quite what I wanted, but close.

I can see some additions I could want. Setting the library directory and code directories via Getopt::Long would be the first one. And it doesn't quite tell me what I want to know in all cases. If I use a function in a program that never gets called, it still gives me a result. But this is a place to start.

And because of this, I now know that, within the stack of previously-invented wheels called CPAN, there's a module called Regexp::Common that holds a stack of established regular expressions. I wanted to pull out all comments out of my programs for testing, so that a commented-out function doesn't count.

#!/usr/bin/perl

use 5.010 ;
use strict ;
use warnings ;

use Cwd 'abs_path' ;
use Regexp::Common qw /comment/ ;

use subs qw{
    check_programs
    module_list

    decomment

    drop_pm
    get_module_subs
    pull_package_name
    pull_module_name
    pull_sub_name
    } ;

my $modules     = module_list '/path/to/my/lib' ;
my $directories =  [
    '/code/directory/one',
    '/code/directory/two',
    '/code/directory/three', ] ;

my $data = check_programs( $directories, $modules ) ;

for my $mod ( sort keys %$data ) {
    my $module = $data->{ $mod } ;
    say $mod ;
    for my $sub ( sort keys %$module ) {
        my $subroutine = $module->{ $sub } ;
        say join "\t", '',
            ( $subroutine->{ count } ? $subroutine->{ count } : 0 ) ,
            $sub,
            ;
        }
    }

exit ;

########## ######### ######### ######### ######### #########
########## #########     Subroutines     ######### #########
########## ######### ######### ######### ######### #########

#--------- --------- --------- --------- --------- --------- ---------
# The core of the program
sub check_programs {
        my ( $directories, $modules ) = @_ ;
        my $data ;
        for my $program_dir ( @$directories ) {
            my $program_directory = abs_path $program_dir ;
            chdir $program_directory ;

            #say $program_directory ;
            my @directory = glob '*.cgi *.pl *.pm' ;

            my $programs ;
            @$programs = map {
                { $_ => get_program( $_ ) }
                } @directory ;
            for my $program ( @$programs ) {
                my $k ;
                ( $k ) = keys %$program ;
                my $v = $program->{ $k } ;

                #say join "\t", '', $k ;
                for my $module ( @$modules ) {
                    my $mk ;
                    ( $mk ) = keys %$module ;
                    my $mv = $module->{ $mk } ;

                    #say join "\t", '', '', $mk ;
                    for my $sub ( @$mv ) {
                        my $result = $v =~ /$sub/ ? 1 : 0 ;

                        #$result
                        #    and say join "\t", '', '', '', $result, $sub ;
                        $data->{ $mk }->{ $sub }->{ exists } = 1 ;
                        if ( $result ) {
                            $data->{ $mk }->{ $sub }->{ count }++ ;
                            push @{ $data->{ $mk }->{ $sub }->{ used } },
                                abs_path $k ;
                                }
                        }
                    }
                }
            }
        return $data ;
    }

#--------- --------- --------- --------- --------- --------- ---------
# returns the contents of a filename, if it contains 'perl' in the top
sub get_program {
        my ( $filename ) = @_ ;
        if ( -f $filename ) {
            if ( open my $fh, '<', $filename ) {
                my @lines =
                    map { decomment $_ } <$fh> ;

                #return 0 if $lines[0] !~ m/perl/ ;
                return join '', @lines ;
                close $fh ;
                }
            }
        return 0 ;
    }

#--------- --------- --------- --------- --------- --------- ---------
# removes Perl comments
sub decomment {
        my ( $code ) = @_ ;

        #chomp $code ;
        $code =~ s/$RE{comment}{Perl}// ;
        return $code ;
    }

#--------- --------- --------- --------- --------- --------- ---------
# returns an array ref of module names with an array of subroutines
# the module contains
sub module_list {
        my ( $dir ) = @_ ;
        my $directory = abs_path $dir ;
        chdir $directory ;

        my $output ;
        my @directory = glob '*.pm' ;

        @$output = map {
            {
                ( pull_module_name join '/', $directory, $_ . '.pm' ) =>
                    ( get_module_subs join '/', $directory, $_ . '.pm' )
                    }
            }
            map { drop_pm $_ } @directory ;

        return $output ;
    }

#--------- --------- --------- --------- --------- --------- ---------
# returns an array ref to all the functions (minus internal functions
# whose name starts with _) within a given module
sub get_module_subs {
        my ( $mod_path ) = @_ ;
        my @output ;
        if ( -f $mod_path ) {
            if ( open my $fh, '<', $mod_path ) {
                my @lines = <$fh> ;
                push @output, grep { !/^_/ }
                    map  { pull_sub_name $_ }
                    grep { /^\s*sub / } @lines ;
                close $fh ;
                }
            }
        @output = sort @output ;
        return \@output ;
    }

#--------- --------- --------- --------- --------- --------- ---------
# return the package name of a module
sub pull_module_name {
        my ( $mod_path ) = @_ ;
        my @output ;
        if ( -f $mod_path ) {
            if ( open my $fh, '<', $mod_path ) {
                my @lines = <$fh> ;
                push @output, map { pull_package_name $_ }
                    grep { /^\s*package / } @lines ;
                close $fh ;
                }
            }
        return $output[ 0 ] ;
    }

#--------- --------- --------- --------- --------- --------- ---------
# return the package name of a 'package Package::Name ;' string
sub pull_package_name {
        my ( $in ) = @_ ;
        chomp $in ;
        $in = ( split m{\s*package\s*}, $in )[ 1 ] ;
        $in = ( split m/\s*;\s*/,       $in )[ 0 ] ;
        return $in ;
    }

#--------- --------- --------- --------- --------- --------- ---------
# return only the subroutine name from a 'sub sub_name { ' string
sub pull_sub_name {
        my ( $in ) = @_ ;
        chomp $in ;
        $in = ( split m{\s*sub\s*}, $in )[ 1 ] ;
        $in = ( split m/\s*{\s*/,   $in )[ 0 ] ;
        return $in ;
    }

#--------- --------- --------- --------- --------- --------- ---------
# remove '.pm' from end of module file names
sub drop_pm {
        my ( $in ) = @_ ;
        $in =~ s/\.pm$// ;
        return $in ;
    }


Is There An "App" For That?

So, I've been developing a workflow for quite some time. I have SQL tables feeding data into Perl-generated web sites and JSON feeds feeding AJAX pages using jQuery, and if I think about it, I could probably feed two or three more technology buzzwords into there, too. CSS, for sure.

A big problem is that, as I proceeded, I had no idea what I was doing for a lot of it. What I know about writing Perl modules, I know due to this workflow. Well, I knew "start with Package and end with 1 ;", but beyond that, not a whole lot. The web has helped this process a lot, and now I'm not too ashamed at what I have.

(Object orientation. Moose. It could use improvement and I would learn from the experience. But that isn't today's topic.)

So, by and large, I have modules handling the SQL queries, then passing hashes or hashrefs, depending on what I want. But, as I said, I was figuring out what I'm doing as I go along. I'd get the problem, think "I need these tables", write modules to interface with the tables, write programs that use the modules, fail to write tests for the modules because what tests I could imagine are either trivially stupid or highly dependent on the state of the database (I have Perl Testing on my desk and have worked through the first two chapters, so I know it's a failing, but see the note on object orientation.)

In a practical sense, if I'm dealing with a foo, I will have a table foo and maybe foo_attributes connecting with it. Then I'd write the module Foo.pm with create_foo, update_foo, delete_foo, read_foo and get_foo_list for a list of all the foos in the table. I can tell which tables I'm dealing with from the function name, for the most part. When it gets to the interaction of foos, bars and blees, it gets hairy, but that's why I get paid the big bucks, right?

Right now, I'm looking to document the schema and DROP TABLE cruft before adding a few more features, which means what I'd like to do is look through my code base for all the modules I'm responsible for, search for all the functions I'm calling in each, and give me a list that might look something like this:
  • list_foo.cgi
    • VarLogRant::Foo
      • get_foo_list
      • read_foo
Now, I have a naive text-parsing concept of how I could do this, but honestly, I can't imagine that I'm the first person who would want something like this, and I know that the Perl community loves to build tools for building tools, so is there something I should be looking at before I dive into this myself?

2011/03/16

The Schwartzian Mindset

Yeah, but your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should. Dr. Ian Malcolm

I have recently come to a knowledge of the power of map and grep when dealing with hashes and arrays of data.

The key to entering this mindset is the Schwartzian Transform. Here is a canonical example.
@sorted = map  { $_->[0] }
          sort { $a->[1] cmp $b->[1] }
          map  { [$_, foo($_)] }
               @unsorted;
For years, I looked at that and said "Huh?", then wrote for my $i ( @unsorted ) { ... } or something like that. In fact, let's look at a for-loop implementation that would get similar results.
my @sorted ;
for my $i ( @unsorted ) {
          my $j = foo($i) ;
          $hash{$j} = $i ;
          }
for my $k ( sort { $a cmp $b } keys %hash ) {
          push @sorted , $hash{ $k } ;
          }

That's a lot more code, and it would not surprise me at all if my code was far slower than the Wikipedia example code, but I understood for loops from my first year of programming, so I can look at the second example and immediately understand it, and would still understand it if I came back and looked at it in a month.

Of course, you can comment this sort of thing to explain it to your later self.
@sorted = map  { $_->[0] } # taking away the sorting field
          sort { $a->[1] cmp $b->[1] } #sorting by the 2nd field in the anonymous arrays
          map  { [$_, foo($_)] } # an array of 2-element anonymous arrays
               @unsorted; # the original unsorted array 

I find myself using the map and grep method a lot lately. It avoids a lot of extra arrays floating around, but as I looked at the Schwartzian Transform and boggled for years, I could see myself, or the next person holding this seat, looking at the code for hours and failing to get it. But at least, now that I can code like that, I can wonder if I should...

2011/03/11

Computing and Simplicity

As I was preparing lunch the other day, I noticed two co-workers poking at a computer. One looks at me and says "Aren't computers supposed to make things simpler?"

No.

A thousand times, no.

Computers are supposed to allow us greater and more numerous avenues of possibility.

I say that all science is computer science these days because if you can do something without having to handle large amounts of complicated data, using computers to tell machines to manipulate things far smaller than you can do by hand, without using modern instruments and the computers that control them — if you can find out what you want to find out without that, it was probably done years ago.

I say that because computer networking allows for computer networking, for knowing people via Facebook and Twitter and Skype and wikis and chatrooms and whatever, which gives you connections based on interests, not proximity.

If you want simplicity, go out and find a plot of land. Plant seeds. Let it rain. The seeds grow. You harvest and eat the seeds, and plant some. Let that be the whole of your life. That's simple. And it's not for me. The whole of civilization has been built so that more and more people can justify not doing that.

(Lest anyone accuse me of slamming farmers, I have farmers in the family, and they more fully embrace and become more expert in more technologies than anyone else I know. They aren't keeping it simple, either.)

Of course, I said this.

Which is not what people struggling with computers want to hear. They want to hear how to take their means of using PowerPoint to generate HTML and/or images which they then put on the Wiki and make it work, and that philosophic postulates which do not address the problem at hand make them angry.

Of course, I noted that they had fallen victim to one of the classic blunders, the most famous of which is "Never get involved in a land war in Asia", but only slightly less well-known is "Never use Microsoft Office if you can help it."

This time, I wisely kept this thought to myself.