perl

Perl

perl stands for practical extraction and report language. It is designed for tasks that are too heavy for shell, and too complicated to code in C.

perl is highly portable. It runs on any unix like system that has C compiler. It runs on most platforms, since package comes with configuration script that pokes dir looking for things it requires, and adjusts include files and defined symbols accordingly. Perl originated in 1990's, became very popular, but now is getting over shadowed by python. Before perl came, awk and sed scripting languages wre used. Perl was a big improvement over these, which contibuted to it's rise. However, syntax wise, python is easier for beginners than perl. I've included perl on this site, as many legacy programs at various companies are still written in perl, which you may need to debug, so knowing little perl is going to be useful. However, if you are looking to learn a new scripting language, move to python. Python has lot more support than perl, and is increasingly preferred for future scripts.

Unlike shell pgm, perl is not a true interpreter . It compiles the file, before executing any of it. Thus it is compiler and interpreter (just like awk and sed).

Link for beginners (There are lot of other useful link for beginners on this site.): http://perl-begin.org/tutorials/perl-for-newbies/ 

Official perl documentation: https://perldoc.perl.org/

perl version: very important to verify version before starting to work, as syntax/features changes a lot b/w versions. Perl version 5 and beyond have lot more changes compared to earlier versions.

perl -v => returns v5.18.4 on centOS 6 running on my machine.

perl -V => (note capital V). This shows lot more details as compiler, library, flags, etc used for perl on this system

simple perl pgm: test.pl => we name the file with extension .pl as a convention. Unix doesn't care about file extensions, as it's not used for anything.

#!/usr/bin/perl
use strict;
use warnings;

print "Hello ARGS = @ARGV $0 \n";

Save file above as test.pl, then type:

chmod 755 test.pl => This makes the file executable.

./test.pl cat dog => this gives "Hello ARGS = cat dog ./test.pl"

Basic Syntax:

Just like any other programming language, perl has variables to store diff data types, has reserved keywords or commands and special characters. These allow the language to do all sorts of tasks.


1. comments: start with # till the end of line. No multi line comments


2. semicolon ; => all stmt terminated by ;


3. whitespace (spaces, tabs, newline, returns) => optional. whitespace is mandatory only if putting 2 tokens together can be mistaken for another token, else not needed. However, as we have seen with other scripting languages, we should always put whitespace to avoid ambiguity

4. curly braces {} => curly braces are used to group bunch of stmt into 1 block. Mostly used with control stmt.


5. parenthesis () => parenthesis for built in functions like print are optional. ex: print ("Hello\n"); print "Hello";

6. use <mod_name> <LIST>; => This function imports all the functions exported by MODULE, or only those referred to by LIST, into the name space of the current package. LIST is optional, but saves time and memory, when all functions in MODULE are not needed

ex: use Cwd qw(cwd chdir); => imports functions "cwd" and "chdir" from module Cwd

ex: use Time::HiRes "gettimeofday"; => imports function "gettimeofday" from module Time::HiRes

use strict; => this is perl pragma that will require all var to be declared before being used (all var will need to be declared with "my", else it will generate an error). pragma is a directive to compiler/interpretor on how to process its i/p (sort of cmd line options). "use" stmt applies pragma even before the pgm starts. This is enabled by default on perl 5.12 and later, so no need to explicitly code this.
ex: my $a=2; => "my" makes var local to the scope, so that once code is out of this scope, var is restored to its original value outside of scope. This helps prevent conflicts from having same name in multiple places. This is useful using in subroutines, as all var are global by default.  Since we used "strict" above, if my wasn't used to declare $a, then any refernce to $a would be an error (i.e $a=2 is error). This helps to find typing errors.

ex: our $q; => our var can be accessible from any code that use or require that file/package by prepending with appr namespace. $pkg1::q (if we used my $q, then this would have given error as q won't be accessibke outside the pkg)

use warnings; => This turns on warnings, and got introduced in perl 5.6. It hss same effect as -w on 1st line of perl (#! ... -w)
reserved words are almost always lowercase. So, use uppercase for user defined var. var names are case sensitive. so var and VAR are 2 diff names.

Data types:

Perl has 3 data types: scalar, array of scalar and hash of scalar. Any var not assigned has undef value (0 for num and empty string for string). print function can be used to print scalar and array data type directly, while hash needs each element to be printed separately.

1. scalar: preceded by $ sign, and then a letter followed by letters,digits,_. It's single unit of data which may be int, float, char, string or a reference.  Data itself is scalar, so var storing it is scalar.
operators (as + or concatenate) can be used on scalars to yield scalars.
ex: $salary=25.1; $name="John Adf"; $num = 5+ 4.5; #perl interprets scalars to get correct computation. Note: 5+4.5 with no spaces also works.
ex: $camels = '123'; print $camels + 1; => prints 124 as scalars are interpreted automatically depending on operator

Scalar comes in 3 different flavors: number, string or reference.

A. Numbers: Though numbers can be int or float, internally perl deals only with double precision float. i.e int are converted to float internally. number literal can be 1.25, -12e24, 3485, etc. These are called as constants in other pgm languages. perl supports octal and hex literals also. Numbers starting with 0 are octal, while those starting with 0x are hex.

$num ="129.7"; => here even though number is in double quotes and a string, it will be converted to number if numeric operator (i.e +) used


B. Strings: seq of char, where each char is 8 bit value from entire 256 character set. string lieterals can be any char in these 2 flavors:
  I. single quoted strings: anything inside ' ... ' is treated as it is (similar to bash where it hides all special char from shell interpretation), except for 2 exceptions = backslash followed by single quote and backslash followed by backslash. backslash followed by anything else is still treated as it is.
    ex: 'don\'t' => this is converted to string don't. 'don't' gives a syntax error, as it treats don as a string and sees t' later which is not valid token.
    ex: 'hello\\n' => this is treated as hello\n. 'hello\n' will be treated as it is.
    ex: $cwd = 'pwd' => here pwd string is printed instead of dir, as special char "pwd" is hidden due to single quotes
  II. double quoted strings: acts like c string. It is similar to bash where all whitespace char are hidden from shell, but all other special char are still interpreted. Here backslash takes full power to specify special char as \n=newline, \x7f=hex 7f, etc. Also, variables as $x are interpolated in "...", while they aren't in ' .. '.
    ex: "coke\tsprite" => coke tab sprite => tab space is added b/w coke and sprite

NOTE: print function can have both single quotes or double quotes, and they are treated same way as above.

ex: $a='my name'; print $a; => prints var a, which is "my name"

ex: print "$a";=> will print "my name" as substitution done within " ... "

ex: print '$a'; => will print "$a" as special char are treated as is within ' ... '

C. Reference: This is explained below after array and hash.


2. array: preceded by @ sign and stores ordered list of scalars. array list @var also accessed via $var[0], $var[1], etc
ex: @ages = (25,30,40); print "$ages[0] $ages[1] $ages[2]" => 25 30 30
ex: $#ages => gives index value of last element of @ages, in this case 2 (since 0,1,2)
ex: $ages = (25,30,40); => since $ages is scalar, length of array is assigned to $ages, which is 3. $ages=@ages also gives 3. So, scalar(@ages)=$#ages+1 => always true since scalar() returns length of list
ex: @names = ("john W", @ages, "amy r12", 1);
ex: print "@names"; => This will print the whole array (no need to separate it out into individual elements (as $names[0], etc)
ex: ($me,$lift)=@names; => sets $me="john W", $lift="amy r12"
ex: ($alpha, $omega) = ($omega, $alpha); => this swaps the 2 values, occurs in parallel (nt like C)

various builtin functions available for array:
A. push/pop, shift/unshift, reverse, sort
ex: sort(Fred, Ben, Dino) => returns Ben, Dino, Fred.
ex: @guys=("Fred", "Ben"), others is a func that returns Dino. sort(@guys, others()) => returns same sorted list as above.
B. chomp(@ages); => chops last char from each element of array

C. qw => quote word function creates a list from non whitespace parts b/w (). Instead of bracket, we can also use other delimiters as {..}, /../, etc.
ex: @names = qw(john amy beth); => creates list "john","amy","beth". no " ... " or , required. list built by removing whitepsaces.

D. q => this returns single quoted string, no whitespace separation or interploation of any var done. Just whole string returned with single quotes.

ex: $a = q(I am good $NAME is); => $a = "I am good $NAME is"

D. scalar(@_); => returns num of elements in array. See ex above.

3. hash: preceded by % sign, and used to store sets of key/value pair. key/value pair may have single quotes, double quotes or no quotes as needed.
ex: %data = ('john p', 1, 'lisa', 25); print "$data{'john p}"; => prints value 1. "john p" is the key.
    @dat1 = %data; => This assigns hash data to array dat1. So @dat1 = ('john p', 1, 'lisa', 25); We can also assign one hash to other: %dat2=%data. %data=@dat1 converts array to hash (odd entries are key, while even entries are values)
ex: %map=(); => clear the hash
ex: %map = (red=>0xff; green=>0x07; ...); #other way of assigning hash values
ex: $global{Name} = "Ben"; => Name is key, while Ben is value
ex: $GLOBAL{"db_path"} = "$GLOBAL{db_root}/$GLOBAL{version}/verification"; => substitution happens to provide complete path

ex: print %map; => This will print nothing. This is because hash elements can't be printed directly. We will have to use "each" (with a while loop) or "foreach" function as below. However, if hash is passed into a sub, it gets converted to an array @_, and printing @_ will print the list (as arrays can be printed directly).

various builtin functions available for hash:
1. keys: keys(%ages) => returs all keys. i.e returns all odd numbered elemenets of array (1st,3rd,5th,etc)
2. values: @num = values %ages; => returns all values, i.e all even numbered elements. note no brackets as they are always optional
3. each: returns key/value pair for all elements of list
ex: while (($name, $age) = each(%data)) { print $name $age ; } => each key/val pair ssigned to var.
ex: foreach $key (keys %ages) {print $ages{$key};} => other way to access key/val pair

NOTE: => is used in hash, but one other use is as fat comma. It's a replacement for comma.

ex: Readonly my $foo => "my car"; #here ReadOnly module (an alternative to constant) is called, It's syntax is 'Readonly(my $foo, "my_car")' to assign "my_car" to $foo as constant. Since ( ) around args in sub are optional, this can be written as 'Readonly my $foo, "my_car". Here , can be replaced with =>, resulting in ' Readonly my $foo => "my_car" '.


NOTE: hash can be converted to array which can be converted to scalar($age). Array is just collection of scalars stored in index 0 onwards ($age[0]), while hash is collection of scalars stored in array whose index values are arbitrary scalars($age{john}).
perl maintaines every var type in separate namespace. So, $foo, @foo and %foo are stored in 3 different var, so no conflict.

typeglob: Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a * , because it represents all types. This used to be the preferred way to pass arrays and hashes by reference into a function, but now that we have real references (see below in subroutine section), this is seldom needed.

The main use of typeglobs in modern Perl is to create symbol table aliases.

ex: *this = *that; => this makes $this an alias for $that, @this an alias for @that, %this an alias for %that, &this an alias for &that, etc. Much safer is to use a reference, as shown below.

ex: *var1 is same as \@var1, as both are ref to array @var1

Another use for typeglobs is to pass filehandles into a function or to create new filehandles. If you need to use a typeglob to save away a filehandle, do it this way:

$fh = *STDOUT; #here we get ref to STDOUT by prefixing it with *, and store that ref in scalar $fh. We can also do it as a real reference like this: $fh = \*STDOUT; Now, $fh can be used instead of STDOUT, i.e

ex: print $fh "print this line"; #instead of "print STDOUT "print ...""

ex: use LogHandle; $fh = LogHandle->hijack(\*STDOUT); $fh->mute(); *fh->autoflush(); #here we are passing ref to STDOUT to LogHandle::hijack module. We get as return value a scalar $fh. We can call functions using $fh or *fh. This is perfectly valid.

1. Scalar: we talked about numbers and strings in scalars, but there's a third kind of scalar called reference.

C. Reference: reference are similar to pointers in C. They hold the location of another value which could be scalar, arrays, or hashes. Because of its scalar nature, a reference can be used anywhere, a scalar can be used. Being scalar (since addr is a scalar quantity), reference stores as $var. Reference can be static or dynamic.

1. static reference is one where changes made to reference change the original value. To create a static reference to any var, precede that var by backslash (\).

$scalarref = \$foo; => Here we create ref (addr) for var $foo, by preceeding it with \. Now, $scalarref has addr of $foo

$arrayref = \@ARGV; => for array

$hashref = \%ENV; => for hash

$coderef = \&handler; => function/subroutine

$globref = \*foo; => globref for foo (foo may be scalar, array, hash, func, etc.

Function reference: ex: sub print_m { ... }; $print_ref = \&print_m; &$print_ref(%hash); => calling func by ref. Useful in OO pgm.

Function ref(): This returns the var type of any reference. So, ref($mapref) returns HASH (since $mapref is refrencing hash type). It can return SCALAR, ARRAY, HASH, CODE, etc. If arg is not a reference, then it returns FALSE.

2. Dynamic reference are ones where a dynamic copy of the object is made, and if changes are made to this reference, then the original var doesn't change with it. To create a dynamic reference to any var, enclose that var within [ .. ] for array and within { .. } for hash. Mostly used with constants (i.e when we don't have a var assigned to store these constants)

array: Use [ ... ]

@ages = (25,30,40);=> stores array in var @ages.

$ages = (25,30,40); => stores size of array. So, $ages=3

$agesref = [25,30,40, ['q','a']]; => Since square brackets used, it creates copy of this and stores ref of that array in $agesref

$agesref = [ @ages ]; => this creates dynamic ref to array @ages

hash: Use { ... }

%data = ('john p', 1, 'lisa', 25); => stores hash in var %data.

$dataref = {'john p', 1, 'lisa', 25}; => Since curly brackets used, it stores ref of this hash in $dataref

%map = (red=>0xff, green=>0x07, blue=>(magenta=>45, ...), ..); => other way to store hash

my $mapref = {red=>0xff, green=>0x07, blue=>{...}, other=>[ ...], ...}; => Since curly brackets used, it creates copy of this ref and stores ref of this hash in $mapref. Note that inside we can have multi level hash/array (i.e other is a arrayref in this ex, since it has array in [ ... ]).

$hashref = [ %data ]; => this creates dynamic ref to hash %data

DeReference of var: Derefrencing means getting the var back from the addr. It's same for both static or dynamic ref. Use $,@ or % in front of ref var. We can use { ... } around scalarref for clarity or when the expression inside them is complex.

$scalarderef = $$scalarref; or ${$scalarref}; => putting a $ in front of the ref var, gets the value pointed to by that ref var. (or using ${$scalarref} is the same thing)

$arrayderef = @$arrayref; or @{$arrayref}; => printing $arrayderef prints the whoele array (though with no spaces)

$hashderef = %$hashref; or %{$hashref}; => printing $hashderef prints the whoele hash key+value (though with no spaces)

&$coderef(args); => function call using reference. To call function via indirect way, we do "&handler(args)". See in Function section below.

arrow operator ->: An arrow operator is used in C pgm to access individual elements of struct pointer (reference to struct). i.e for struct person *p_ptr with element age, we do "p_ptr->age". We use similar concept in perl to access elements of array or hash reference.

1. array: use -> followed by [ .. ]. Inside [ ], enter Number "n" which will get the value of nth element of array.

ex: $cont = [ 1,2,ab,cd]; $cont->[3] refers to 4th element = cd

2. hash: use -> followed by { ... }. Inside { }, enter the "key", which will get the value corresponding to the key

ex: $cont = {"title me"=>a, name=>"john c", addr=>{city=>aus, zip=>12231} }; $cont->{"title me"} gets value "a". $cont->{addr}->{city} gets value "aus"

3. mixture of array and hash: use [ ]  for array and { }  for hash. For multilevel, we may omit subsequent -> after the first one.

ex: $cont = {"title me"=>a, name=>"john c", addr=>{city=>aus, zip=>12231, addr=>[{street=>"main"}, {house=>201}] } }; print "$cont->{addr}->{addr}->[1]->{house}" gets value "201"

4. class/subroutine (or object/method): use -> followed by ( ... ) => We use ( ) for args of methos, and not for method itself. this is used in OOP section later. class is treated as reference to data and subroutine. It points to mem loc of 1st element of class.

ex: $class1->new("matt",10); #Here class1 is a package named "class1". We are calling subroutine named "new" in this class.

ex: $obj->{name}; #here $obj is ref to class "$class1", and has hash data type. So, it's similar to case 2 abov, where we get the value corresponding to key "name". NOTE: { ..} used here since it's referring to hash object.


operators: these operate on scalar or list, and returns scalar or list. () defines precedence of operations in case if ambiguity.

scalar operators:
1. for numbers: arithmetic: +,-,*,/,**(exponent),%, comparison (returns true/false): <,>,<=,>=,==,!=
2. for strings:
   A. concatenation (.). ex: "hello"."world" => "helloworld"
   B. comparison: eq, ne, lt, gt, le, ge. ex: 7 lt 30 gives false, as 7 and 30 are treated as strings, and string "30" comes before string "7" as 3 has lower ascii code than 7. If numeric operator < was used, then it would return true as literals would be converted to numbers, and 7<30 is true.
   C. string repeatition: consists of single lowercase letter x. ex: "fred" x 3 => "fredfredfred"          
      ex: (3+2) x 4 => "5555" as 3+2=5 is treated as string since there's a string operator on it.

perl converts numbers to strings or viceversa depending on operator. If operator is nummeric, literals are converted to float, and if operator is string, then numbers are converted to string. If literal can't be converted to correct type for that operator, then an error is printed. So, even though perl doesn't have types for scalar, it uses operator type to figure out literal type as number or string.
ex: $name="john"; print $name+1; => here john can't be converted to number, so error "Argument "john" isn't numeric in addition (+) "
ex: $name="123"; print $name + 1; => here 123 can be converted to numeric, so + is carried out and 124 printed. NOTE: spaces don't matter

= is also an operator.  $a=17; a gets value=17, but this whole expresssion is also given value of $a (which is 17)
 ex: $a= ($b=15); => b is assigned 15, but then a is assigned value of ($b=15) which is $b which is again 15.

shorthand operators:
$a += 3; => $a=$a+3;
$str .= "me"; => $str = $str . "me";
$e = ++$a; => $a=$a+1;$e=$a; => prefix version of autoincrement, $e gets incremented value
$e = $a++; =>  $e=$a;$a=$a+1; => sufffix version of autoincrement, $e gets non-incremented value

defined operator: scalar can be defined or undefined. undefined scalar, or scalar with null string("" , i.e nothing within the string) or number 0 are all interpreted as FALSE when scalar is used in Boolean expr, while anything else is treated as TRUE.

ex: if (defined($args)) { ... } #We can omit brackets around args of function. so "if (defined $args)" is also valid


control stmt: control expr is evaluated as string.  If empty "" or "0" string, treated as false, everything else is true

1. if/else:
if ($ready) { $a=1;}
elsif { ...}
else { ... }

2. while/until: while => repeat while expr is true. until => repeat until expr is false
while ($tcks <100) { $sum += ... }
while (@ARGV) { process(shift @ARGV); }

3. do/while: with while, if cond is false, loop will not execute even once. do/while causes loop to execute atleast once
do { ... } while ($cnt <100);

4. unless:
unless ($dest eq $home) {print ...;}

5. for/foreach: These can be converted into equiv while stmt.
for ($sold=0; $sold<100; $sold++) { ... }
foreach $user (@name) { if $user ... } => here each element of @name is assigned to $user and loop run for each element. Modifying $user modifies original list (since it's a reference and NOT a copy)

6. next/last: next allows to skip to next iteration, while last allows to skip to end of block, outside of loop
foreach $user @user {
  if ($user eq "root") {next;} #skip to next iteration
  if (...)             {last;} #comes out of loop
}
If we specify loop by a var, then we can specify which loop to break out of by specifying loop name.
LINE: while ($line = <FILE1>) { # this loop is anmed LINE
       last LINE if $line eq "\n"; => we get out of loop LINE when we encounter 1st blank line
       next LINE if $line =~ /^#/; => skip comment line

      do something .....
      }

7. goto: ex: goto LINE;

8. switch: For switch cmd to work, "Switch" module needs to be used, which requires some other modules to be installed. syntax same as in other languages.

use Switch;

switch(arg) {

 case "a"  {print "name"; .... }

 case /\w+/ {print "..."}

 else { print ...}

}


Built in functions: perl provides a lot of built in functions that are very helpful. Most of the times you can use these functionsto write more complex ones

1. chop($x) => it takes a scalar var, and removes last char from string value of that var
ex: $x="hello"; $y=chop($x); => $x becomes hell. $y gets assigned the chopped char "o".

2. chmop($x); => removes only the newline char at end if present, else does nothing.

3. print/printf: printf is C like providing formatted o/p
ex: printf "a=%15s b=%5d c=%10.2f \n",$a,$b, $c; => string $a is printed in 15 char field, decimal number $b in 5 char field, fp num $c in 10 char field with 2 decimal places

4. split/join:
ex:@fields = split(/:/,$line); => split $line using : as delimiter and assign it to $fields[0], etc.

5. system cmds: any linux cmd can be executed using system or backtick

A. system cmd: any unix cmd run using "system" should be avoided as it makes perl unportable and may break perl script for other users or other linux machines. This is because, cmds run using "system" run on current shell of user which may be bash, csh, etc. So, if some other user has a different shell, which supports some other version of this cmd, then the system cmd may not work any more. Also, the return status of system cmd is 0 on success (any non zer value indicates a failure which is different than how all other cmds behave). Other problem is that system cmd has3 diff forms, and depending on which one is used, i may behave differently. So, avoid "system" cmd all together. Instead use perl modules as mkdir, chdir, etc.


system("date"); #
$status=system($cmd); #runs whatever $cmd is. $status is assigned 0 on success
system "grep fred in.txt >output";
system "cc -o @options $files"; #var substitution occurs

B. backtick or qx: any cmd inside backtick or qx is executed. backtick is an operator.
my $output = `script.sh --option`; #using backtick, cmds within `` are executed and results returned to STDOUT (in this case to $output)
my $output = qx/script.sh --option/; #similar to above as qx/.../ same as ``

We have system cmds for cd, pwd, etc that we can execute using "system" or backtick. However, it's preferred to use perl provided modules for doing this, as they work across all platforms. These modules eventually end up making the system call, but do it cleanly.

1. getcwd: This gets current working dir. same as unix "pwd" cmd.

use Cwd qw(getcwd);

$cur_dir = `pwd`; => this returns unix pwd but has a trailing newline at end. This is stored in var $cur_dir

$cur_dir = getcwd; => same as above, except no newline at end

2. chdir: This changes dir to specified dir. same as unix "cd" cmd.
use Cwd qw(chdir);

$save_pwd_dir = `pwd`;

chomp $save_pwd_dir;

$status=chdir($save_pwd_dir);=> since `pwd` above returns newline at end, using it in chdir module will return status of 0 (i.e error, so no cd happens). Only if we remove the newline by using "chomp", is when the chdir cmd will work and will status of 0 (i.e success)

$cur_dir= getcwd;

chdir($cur_dir); => This cmd works and returns status of 1, since there's no newline in $cur_dir (since getcwd cmd was used)

chdir("/home/ajay"); => Here we change to given dir, by directly specifying the name

6. here: It's not a function. It's the same "here" as in bash script. syntax is "<<IDENTIFIER; .... Any Stmts .... IDENTIFIER". The same effect can be achieved with print stmt, but that will need multiple print cmds, one for each line. To interploate variables in stmts use double quote around "IDENTIFIER", else use single quotes 'IDENTIFIER'.

ex:below ex in perl script will print stmt1 and stmt2 on screen, since default for print is STDOUT. newlines if present in text are automatically printed.

print <<Foo;

My name is

You are ill)

Foo

ex:below will print the stmt in $file1 handle which is opened in write mode

open my $file1 '>', "file.txt" or die $!

print $file1 <<My_text;

this is test;

My_test

 
subroutines:

Declare: ex: sub NAME1; #forward declaration of a subroutine NAME1. If we have args, use sub NAME1(PROTO); All sub have default arg list stored in @_ array, which can be used inside body of sub(@_ stores args, i.e @_[0], @_[1]). @_ is private to that invocation of sub (i.e local copies made), so nested sub can be called, w/o these values getting overwritten.
To declare and define all in one place, just add the block to it. i.e sub NAME1 (args) {BLOCK } => NOTE: args of sub is in ( ... ), but body is in { ... }. arg list is optional even if args are used by the calling func, as @_ will store the args for any sub. Also ( ) brackets are optional for args, so "sub NAME1 arg1 arg2 {BLOCK }" is perfectly valid


ex: sub say_hello { print "hello $what"; return $a+$b; } => any var used within sub are global by default (diff than conventional C pgm). To make a var local, declare it with my() i.e: my($sum, @arr, %a); my($n,@values)=0; my $a; "local" can also be used to declare local var. return value is what's specified, or the last expression evaluated. We can return any data type, i.e. scalar, array, hash. If no return value provided, then the last calc performed becomes the return value (if print is the last calc done, then 1 is the return value). If we retrun more than 1 array or hash or a combo, then their separate identities are lost. In such cases we use references.

To call subroutine, 2 ways
1. direct calling: Here we call by directly providing name of sub with optional args. ex: NAME1; NAME1(list); NAME1 LIST; => any of these 3 ways is fine.
ex: $a=3+say_hello(); => here sub returns value of $a+$b
ex:
sub bg { my(@values) = @_; foreach $_ (@values) { return @result; } }
@val = bg(1,2,3); => any number of args can be provided as @_ stores them in array (as many as needed). Note, sub above doesn't have any arg list in it's defn (it's implied). Return value is stored in array @val.
 
my $cont = get_contents(); => this stores return value from func in scalar $cont. If return value is array or hash, then conversion happens, as explained in array/hash section above.


2. indirect calling: Here we call by providing reference to function (i.e pointer to addr of func). This was used in Perl 5.0 and before, but not recommended. ex: &NAME1;
ex: &bg(1,2,3); => same o/p as above except that func called via reference.

ex: $func_ref = \&bg; &$func_ref(1,2,3); => same o/p as above. here addr of func "bg" passed on to $func_ref. Now we accesss "bg" by derefrencing the addr $func_Ref.

Passing args to functions: Args can be any data type, and they can be passed via value or via reference. We pass them via reference, when we want to alter the original arg itself.

ex: $tailm = my_pop(\@a, \@b); Here array @a,@b are passed by reference, so whatever we do to @_ inside my_pop func, modifies @a and @b too.

 

module / package:

  1. module: A Perl module is a reusable collection of related variables and subroutines that perform a set of programming tasks. There are a lot of Perl modules (>100K) available  on the Comprehensive Perl Archive Network (CPAN). You can find various modules in a wide range of categories such as network, XML processing, CGI, databases interfacing, etc. Each perl module put in it's own separate file called as file1.pm, having same syntax as perl file. It can be loaded by other pgm or modules, by using do, require or use.
  2. Package: Packages are perl term for namespaces. Namespaces enable the programmer to declare several functions or variables with the same name, and use all of them in the same code, as long as each one was declared in a different namespace. Packages are the basis for Perl's objects system (explained later). Our main perl script itself is in "main" package (so all var can be referenced as main::a, or just plain "a"). We switch package using "package" keyword. Then our namespace changes to package_name (until the end of file). Now we can use var in this package using new package namespace.

Diff b/w module and package: Although package and module are used interchangeably, they are completely different. package is a container (a separarte namespace), while module is a perl file that can contain any number of namespaces. It doesn't need to have any kind of pkg declaration inside it. To load a module in anaother file, we use any of do/require/use keyword. "use dir1::File1" just loads a file named dir1/File1.pm. To remove confusion b/w these, perl programmers obey these 2 laws, so that package and module can be treated as same thing:

  1. A Perl script (.pl file) must always contain exactly zero package declarations.
  2. A Perl module (.pm file) must always contain exactly one package declaration, corresponding exactly to its name and location. So, every module goes with same package name.

I. writing your own module: Filelog.pm => pm means perl module

package Filelog; => makes Filelog  module a package. We adhere to law 2 above (name of module file exactly same as package name, or else it will error out). So, now namespace is "Filelog" instead of main or anything else. So, we don't have to worry about using my() for each var. all var/sub from here on will be in namespace Filelog

use strict;

my $LEVEL = 1; //put global var $LEVEL to 1, so that any subroutine can access it

sub open_my{ .... $a = shift; ... } => write subroutines for diff functions to do

1; => this is required to return a true value from this module to the calling pgm. Newer versions of perl do not require this. We keep it for backward compatibility

II. Using above module in other pgm: pgm1.pl (we do not need separate file for package, we can put all code for package "Filelog" in pgm1.pl too)
    
#!/usr/bin/perl
use strict;
use warnings;
 
use FileLog; =>load Filelog module.  we could use any 1  of these 3 stmt: do, require, use. Since there is also a package declaration with same name, new namespace "Filelog" can be used.
 
FileLog::open_my("logtest.log"); //sub in modules called by using namespace separator (::). args within brackets passed to subroutine "open_my"
 
FileLog::log(1,"This is a test message"); //sub "log" in namepsace "Filelog" with 2 args

$STDERR = LogHandle->hijack(\*STDERR); #this is other way of calling sub in package "LogHandle". See in package section later
 

Read cmd line args: All languagaes have way of reading cmd line args. We can write our own code to get args or use perl module for that. Getopt is a very popular module to get args of a cmd line.

1. Regular way: Al cmd line args in perl are  stored in @ARGV array (after the name of script). $#ARGV is the subscript of the last element of the @ARGV array, so num of args = $#ARGV+1. $0 stores the name of the script, that we are running

ex: ./test.pl cat dog => here @ARGV stores "cat dog" array. so, $ARGV[0]=cat, $ARGV[1]=dog and so on. $#ARGV=1 (since num of args=2). $0 stores ./test.pl

2. test.pl

use Getopt::Std; #load Getopt/Std.pm module

my %options=(); => we declare empty hash "options"
getopts("hj:", \%options); => We store args in ref to hash "options". here we are capturing arg values specified via flags -h -j. : indicates that there is more stuff coming after -j. So our cmd line is something like this "./test.pl -h -j my_help". There are many different ways of storing args via getopts. Look in perl doc.
print "option $options{h} , $options{j}\n";

run: ./test.pl -h -j amit => prints "options 1 , amit"

Signal trap pragma:

use sigtrap qw(handler my_handler normal-signals); => This pragma is simple i/f to installing signal handlers, so that when the program abruptly quits, we can do graceful exit, by having a sub execute on receiving interrupt. Here "my_handler" sub is called on getting interrrupt. There are many signals as INT, ABRT, TRAP, etc that causes perl script to terminate. The last arg "normal-signals" says that employ this handler for only normal-signals as INT, TERM, PIPE and HUP, and not for other interrupt signals,

sub my_handler {

   my $signal = shift; #gets the signal causing the pgm to terminate

   die " Pgm killed with signal $signal";

}

special code blocks:

There are five specially named code blocks that are executed at the beginning and at the end of a running Perl program, if present in the pgm. These are the BEGIN, UNITCHECK, CHECK, INIT, and END blocks. These code blocks are not subroutine, even though they look like it. "BEGIN" is exxecuted at the very beginning of script, while "END" block is run at the very end, just before the interpreter exits. Multiple BEGIN, END, etc blocks can be in same pgm, and they are exxecuted in reverse order of where they are in code. Usually 1 BEGIN, 1 END block suffices.

ex:

END {
  my $program_exit_status = $?; #Inside END block, $? contains the value that the program is going to pass to exit()

  print "Exit status is: $program_exit_status"; #we can have an stmt here that we want to be executed at end

}


Format: Perl supports formatting so that scripting languages "sed" and "awk" may no longer be needed, as perl supports more complex formatting.

format => defines a format, and writes data in that format
ex: defining a format. keyword format NAME = <some format> . => . at end is important
format LABEL1 =
 ==========
 | @<<<<< | => @<<<<< specifies a left justified text field with 5 char
 $name
 | @< |
 $state
 ==========
 .
open(LABEL1, ">file.txt"); => filehandle name needs to be same as format name
($name,$state) = ...;
write(LABEL1); => this writes into file.txt

Regular expressions: These are same as ERE we studied in Linux section. However, perl RE have slight variation from POSIX ERE. Perl RE have become so widely used, that when people say RE, they usually mean Perl RE. Perl RE basics are best explained here: https://perldoc.perl.org/perlre

Perl RE are way of describing a set of strings w/o having to list all strings in the set. All ERE regex still valid. Following are the Perl RE metacharacters:

  • dot . => matches any single char except newline
  • * => matches 0 or more of preceeding char
  • + => matches 1 or more of preceeding char
  • ? => matches 0 or 1 of preceding char.
  • \ => backslash to escape next metachar
  • ^, $ => matches beginning or end of line
  • (), {}, [] => () is for grouping subexpressions, {m,n} and [abc], same as in ERE. These are treated as metachar, so use backslash to use them as literals
  • <> => used for capture grps in conjunction with (). This is different than ERE as ERE doesn't use this (BRE uses this but for different purpose)
  • | => Or or alteration. Used inside (), but may be used without () too.
  • - => used to indicate range inside []
  • # => comment

Above metachar are used for pattern matching, substitution, spliting, etc
ex: /foo/ => // is pattern matching operator looking for foo
while ($line = <FILE2>) {
 if $line =~ /http:/ { print $line; } => matches pattern for http:. =~ is pattern binding operator  asking it to do this
}
while <FILE1> { print if /http:/ ;} => our default is $_. pattern binding operator =~ automatically applied to $_. o/p exactly same as above

quantifier:
{min,max} => preceeding item can match min number of times upto max number of times
+ => {1,} matches one or more of preceeding items
* => {0,} matches zero or more of preceeding items
? => {0,1} matches zero or one of preceeding items

common patterns:
/[a-zA-Z]+/ => matches one or more of alphabets
/[\t\n\r\f]/ => matches any of tab, newline etc. Instead of this, we can also use /[\s]/
/[0-9]/ => matches any digit. Same as /\d/. /\d+/ matches any number of digits
/\d{7,11}/ => matches min 7 digits but no more than 11 digits. ex: telephone number
/[a-zA-Z_0-9]/ => matches any single word char. equiv to /\w/. /\w+/ matches an entire word
/./ => matches any char whatsoever (except a newline). needs to be atleast 1 char
/a./ => matches a followed by . => a followed by any char after that => matches all strings that have a in them, and "a" is not the last char
/\S\W\D/ => uppercase provides negation. \D means any non digit char
/(\d+)/=> match as many digits as possible and put it in var $1. If more (), they are stored in $2,$3 etc.
/\bFred\b/ => \b matches at word boundary. So, this matches "the Fred Linc", but not "Fredricks"
/^Fred/ => matches lines beginning with Fred. ^ is anchor for beginning of line, while $ for end of line
/Fred|Wilma|Bren/ => matches any of 3 names
/(..):(..)/ => matches 2 colon separated fields each of which is 2 char long

pattern matching/substitution:

m/pattern/gimosx =>  m=matching, m is optional as by default pattern matching is implied. gimosx is modifier such as g,i,. g=match globally(find all occurences), i=case insensitive matching.
ex: ($key,$val) =~ m/(\w+) = (\w+)/ => extracts key value pair from $_

s/pattern/replacement/egimosx => s=substitution
ex: $paragraph =~ s/Miss\b/Mrs/b/g => substitute Miss with Mrs globally in $paragraph. By default, it works on $_

ex: $MAILBOX->{'ScriptName'} =~ s/.*\/// => it substitutes the script path name with just the script name (strips out everything before the last /). So, "../dir1/./file.txt" will return "file.txt". Useful when trying to get name of script from cmd line.


Files:
----
To read/write files, we need to create an IO channel called filehandle. 3 automatic file handles provided: STDIN, STDOUT, STDERR corresponding to 3 std IO channels.
open (HANDLE1, "file1.txt") or "Cannot open for Read: $! \n"; => Read file. $! contains error msg returned by OS
open (HANDLE1, "<$file1"); => same as above. Read file
open (HANDLE1, ">$file1"); => create file and write to it
open (HANDLE1, ">>$file1"); => append to existing file
close (HANDLE1); => need to close file

File test operator: -e is the operator that operates on any scalar operation
-e $a => true if file named in $a exists
-r $a => true if file named in $a is readable, -w=writable, -x=executable, -d=is_directory

ex:
$name="index.html";
if(-e $name) {print "EXISTS";} else {print "ABSENT";};

<> = Line reading operator
print STDOUT "type number";
$num = <STDIN>; reads complete text line from std i/p upto first newline. That string is assigned to $num (including \n). <> returns undef when there's no more data to read (as in end of file). STDIN can be ommitted here (since default in STDIN)
print STDOUT "num is" chomp($num); \n is removed here.
chomp($num = <STDIN>); => this also works as any action refers to action on LHS of = operator
@num = <STDIN>; => This stores all lines of input in array until CTRL+D is pressed (i,e EOF). Each line is stored separately in $num[0], $num[1] and so on ..

ex:
while (<>) { print $_; } => $_ is the default storage var, when no var specified. this is equiv to
while (defined($_ = <STDIN>) { .. } => At end of file when there are no more lines to read <> returns undef

ex:
#! /usr/local/bin/perl -w => -w for turning ON warning
$num_args = $#ARGV + 1;
if ($num_args != 1) {
  print "\nUsage: def_report_nets.pl  name_of_def_file\n";
  exit;
}
open (DEF, "$ARGV[0]") || die "Cannot open $ARGV[0] for Read ...";
while (<DEF>) { //or while ($_ = <DEF>)
  if (/count : (\d+) ;/) {
    $count = $1; //$1 is assigned to whatever matches in first (). Here $1=(\d+)
    $count_sum += $count;
    print DEF1 "count = $count, sum=$count_sum"; => write this into DEF1 file handle (assuming it's open for edit)
 }
}

Object Oriented (OO):

perl is unique to be both procedural language as well as OO language. OO is not the best soln for every problem. It's particularly useful in cases where system design is already OO, and is very large and expected to grow. OO concept in perl is similar to those in other languages. OO system is either protocol based (as in Javascript) or class based (as in most other languages, as C++, Java, Python, etc). Inheritance, overloading, polymorphism, garbage collection are all provided in perl OO similar to other languages. Perl buil in OO is very limited, and many OO systems have been built on top of this, which are typicaly used (Moose is one such ex). For our purpose, built in OO for perl is good enough. class, method, object and attributes are 4 concepts related to OO. I've put this OO section to get some basics, but if you need to do OO, python is preferred (python is preferred in generate over perl)

class: Class is a name for a category (like phones, files, etc). package explained above declares a class (i.e package Person; declares class "Person"). In Perl, any package can be a class. The difference between a package which is a class and one which isn't is based on how the package is used.

attribute: these are data var associated with the class. An instantiation of class, known as object, assigns values to these attributes.

method: This class (package) has var and sub that work on these var. Sub used within this package are called methods.

class and method in OO term are thus package and subroutine that we studied earlier.

object: Let's create an instance of this class, which is known as object. When we create an object, we actually create a reference to attr/method in the class. All objects belong to a specific class (i.e we can define an object "LG_phone" belonging to class "phone"). We can have multiple objects for a given class. An object is a data structure that bundles together data and subroutines which operate on that data. An object's data is called attributes, and its subroutines are called methods. You can use any kind of Perl variable (scalar, array, hash) as an object in Perl. Most Perl programmers choose either references to arrays or hashes (ref to hashes are most common).

ex: Person.pm => this *.pm file name has to be same as package name, as package is searched for looking for file with name = file_name.pm

package Person; #creates class "Person"

sub my_new { #sub for creating an instance of this class. This is called a constructor, and is usually named "new", but can be anything. This constructor is just like any other method (most OOP languages have special syntax for constructors, but not for perl)

my $class = shift; #First arg passed to any method call is the method's invocant. Whenever we call new() method in perl, it automatically passes class name "Person" as first arg. So, this is the object instance name for this class.

my $self = { Name => shift, ssn => shift }; # this class has 2 attr: Name and ssn. Here the object of this class is of ref to hash data ype, but could have been any type = scalar, array, hash, etc. These 2 attr are the args to this method call. $self is a scalar storing ref to hash

#instead of shift, we could have also used @_. my ($class, $name, $ssn) = @_; my $self = { Name => $name, ssn => $ssn }; 

print "class is $class\n"; print "Name is $self->{Name}\n"; print "SSN is $self->{ssn}\n"; #print values: class=Person, Name=matt, ssn=1224

bless $self, $class; #Turning a plain data structure into an object is done by blessing that data structure using Perl's bless function. W/O this, data structure won't become an obj. 1st arg to bless func is the refrence to data, while 2nd arg is class. So, ref $self is blessed to of class "Person". Otherwise $self remains ref to hash data, just like any regular hash ref.

#we can also combine, $self and bless in same line as below
#my $self = bless { Name => $args->{Name}, ssn => $args->{ssn} }, $class;

return $self; #we return the ref to hash. This is a scalar ref. This becomes the ref of the new object being created.

}

sub setName {

my ( $self, $Name ) = @_; #1st arg is always object ref, so we store in $self

$self->{Name} = $Name if defined($Name); #Now, we can access attr of object

return $self->{Name};

}

sub getName {

my( $self ) = @_;

return $self->{Name};

}

1;

test.pl => this file uses the above package

use Person; #Person package is now included
 
my $object = Person->my_new("Mary",22345); #we are passing args as list (scalar,array,hash) and not as reference. "Person" is also passed as arg so that the object created is associated with class "Person". $object is now a refrence to hash data type containing 2 data (1 keys/value pair). It's associated with class "Person", so is little different than ref to regular hash data type, but still can be treated as "reference to hash data type" for most purpose.

#my $object = my_new Person("Mary",22345); #this is another way to create object

my $name = $object->{Name}; => This references the object "$object" and gets the value for key "Name" which is "Mary"

print $name; => prints Mary. Note printing "$object->{Name}" doesn't work, as print only expands $object which is an addr, so it prints "Person=HASH(0x3f4578A0)->{Name}", i.e -> is not expanded to get the correct value

$name = $object->setName("James"); => sets $object->{Name} to "James". Could have done directly too via: $object->{Name} = "James", however using subroutines as part of a class ids preferred, as it keeps the object organized, by having everything related to an object in 1 place.

Inheritence: Object inheritence is common concept in OOP, so that any class can be derived from any other class. This is useful, if we want to add few more data or sub to an existing class. Instead of modifying an existing class or duplicating everything in existing class to create a new class, we inherit the old class, and just add new code in new class. @ISA cmd achieves that.

package Bar; => new package Bar declared

use foo; => existing class foo

@ISA=qw(foo); => inherit foo into this package Bar

sub my_add { .... }; => we now add new subroutines to package Bar. All subroutines and var in original package "foo" are accessible to this package.

1; => return value for older perl pgm

Misc modules:

1. reading excel files: This is very commonly used to import excel sheet data into perl program:
ex: read excel sheet from libre office .xlsx files => this makes use of OOP. Uses Spreadsheet module available in std modules.

#!/apps/perl/5.14.2/bin/perl
use lib "/apps/perl/modules-1503/lib"; => this adds lib path to existing lib paths to search for modules
use Spreadsheet::XLSX; //here subroutine XLSX from perl module Spreadsheet is loaded (perl module is reusable package defined in a file)

my $spreadsheet = "$ENV{VERIFICATION}/my_testlist"; //here my_testlist is open office excel sheet
if (! -e "$spreadsheet") { //checking for existence of spreadsheet
    print "Spreadsheet $spreadsheet not found. Please try again.\n";
    exit 0;
}

my $excel = Spreadsheet::XLSX -> new ($spreadsheet, $converter);
foreach $sheet (@{$excel -> {Worksheet}}) {
    printf("Sheet: %s\n", $sheet->{Name});
    foreach $row (($sheet -> {MinRow} +1) .. $sheet -> {MaxRow}) { //skipping 1st row=title row
        $testname           = ($sheet -> {Cells} [$row][0]) -> {Val}; #0 means 1st col
        $rtl_count          = ($sheet -> {Cells} [$row][4]) -> {Val}; #4 means 5th col
        ... //do more processing
    }
}

Some useful perl cmds:

1. perl cmd to substitute and replace one pattern with some other pattern in mutiple files (below cmds can be run on cmd line in bash shell, as long as perl is installed):
perl -pi -e s/old_pattern/new_pattern/g dir1/subdir1/*.tcl => does it for one dir only
perl -e s/old_pattern/new_pattern/g -pi.backup $(find dir1 -type f) => does it for all directories and files in dir1. (-pi with .backup creates backup of old original files with .backup extension). works only in bash shell as $(find dir1 -type f) is bash syntax
ex: perl -pi -e s/1p0/2p0/g $(find . -type f) => replaces 1p0 with 2p0 in all subdir starting with current dir. works only in bash shell.