Wednesday, May 30, 2012

Computing Descriptive Statistics with Perl


For anyone that does any type of data analysis work, the computing of basic descriptive statistics is often essential.  As with most things, Perl has a CPAN module available that actually makes the computation of basic statistical values quite straightforward.  In this short script we will take a look at the CPAN module Statistics::Descriptive (http://search.cpan.org/dist/Statistics-Descriptive/lib/Statistics/Descriptive.pm) and use it to perform some basic statistical analysis of a numeric data set.  The data set in the script will consist of 100 randomly generated integers in the range of 50 to 150.  The mean, median, mode, standard deviation, minimum, and maximum values of the data set will then be computed.  

 #!usr/bin/perl

# Copyright 2012- Christopher M. Frenz
# This script is free software - it may be used, copied, redistributed, and/or modified
# under the terms laid forth in the Perl Artistic License

use Statistics::Descriptive;
use strict;
use warnings;

#generate 100 random numbers between 50 and 150
my $range=101;
my $minimum=50;
my @randnums = map { int(rand($range)+$minimum) } ( 1..100 );

#prints the random numbers
#to prove the random number generation worked
foreach my $randnum (@randnums){
    print "$randnum\n";
}

#computes basic statistics on data
my $stat=Statistics::Descriptive::Full->new();
$stat->add_data(@randnums);
my $mean=$stat->mean();
print "The mean is: $mean\n";
my $median=$stat->median();
print "The median is: $median\n";
my $mode=$stat->mode();
print "The mode is: $mode\n";
my $sd=$stat->standard_deviation();
print "The standard deviation is: $sd\n";
my $min=$stat->min();
print "The minimum is: $min\n";
my $max=$stat->max();
print "The maximum is: $max\n";