When the STL Isn't Enough: Adding Perl to Your C++ Applications

Ken Fox
fox@vulpes.com

April 29, 1998

Introduction

The paper introduces a library, libperl++, which provides a safe, simple and complete interface to an embedded perl interpreter. What's that you ask? An embedded perl interpreter is the same perl engine that normally runs your Perl scripts, but it's running inside your application instead of in a separate process. Big deal you say? Imagine all the things you've been doing (or avoiding) in your C++ applications that are trivial in Perl. Imagine all of the reusable modules in CPAN that you've never been able to use with C++. Imagine how nice it would be to extend your applications with macros written in Perl. Embedded perl makes all this possible.

The only catch is that embedded perl uses a complex API for communicating with your application. You've got to understand the API before you can start using embedded perl. That's why libperl++ was developed. It makes using embedded perl much easier and safer. It also throws in quite a bit of support for common idioms such as using Perl as your application configuration system or macro language.

The core of libperl++ consists of a set of classes and templates that "wrap" perl data and completely insulate your application code from the perl internals. The wrappers provide you several benefits over just using the perl internals directly:

Wrappers eliminate a large number of programming errors by enforcing that the wrapped perl objects are used according to the perl API.
Wrappers simplify integrating perl with your application by providing nice syntax and many convenient methods.
Wrappers enable transparent mixing of built-in C++ data and perl data, including using perl's arrays and hashes to hold C++ data.
Wrappers reduce name conflicts when using perl by avoiding the inclusion of the perl header files into your application code.

The rest of libperl++ consists of: a class that interfaces with the perl interpreter itself; some helper classes for dealing with special Perl features like regular expressions; classes for implementing XS routines; macros and templates for making your C++ objects available to Perl code; and some support code for common uses of the library.

Getting Perl To Do Something

The most basic thing the library does is allow you to ask the perl interpreter to evaluate a chunk of Perl code. You'd expect something this basic to be simple. It is. Here is a complete program:

  #include 

  int main () {
    wPerl perl;
    perl.eval("print q(Hello, world!\n)");
    return 0;
  }

The class wPerl handles the overhead of starting, initializing and stopping the interpreter. The eval() method takes a chunk of Perl code and asks perl to run it. The result is then returned back to C++. (In this example the return value is ignored.)

Here's a more complex, useful example:

  #include 
  #include 

  int main () {
    wPerl perl;
    perl.use("LWP::Simple");

    wPerlScalar getstore = perl.subroutine("getstore");

    getstore("ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz",
             "perl.tar.gz");

    cout << "fetched latest perl release as perl.tar.gz\n";
    return 0;
  }

This example was taken from the libwww module cookbook. None of the return values are checked so the code is not robust, but it does demonstrate several of the key features of libperl++. For the rest of this document, only program fragments will be shown, not complete working examples.

Currently, only one interpreter may exist at any given time, but multiple interpreters can be created and destroyed in sequence. In most of the examples, the Perl interpreter is wrapped by a local variable in main(). That will probably not be very convenient for your applications. You will probably want to use a global wPerl * variable.

A decent interface that supports multiple simultaneous interpreters, possibly in multiple threads, hasn't been implemented in libperl++ yet. Certainly this needs to be developed soon in order to support the new features of Perl 5.005. However, if your code only creates a single interpreter using new wPerl(), then your code will continue to work with all future versions of libperl++.

Working With Scalars

The interface to Perl scalars is provided by the wPerlScalar class. Many C++ built-in types are considered scalars in Perl, so libperl++ automatically converts from these C++ types to the equivalent perl scalar form. This greatly simplifies writing C++ code.

When converting perl scalars to C++ values however, you must explicitly request what C++ type you want. Perl automatically performs any necessary conversions though, so you don't need to write any type checking code. In fact, it is extremely uncommon to write type checking code. Perl types are often, and sometimes surprisingly, changed as side-effects of other operations.

The functionality provided by the wPerlScalar class covers most of the Perl scalar operators and functions. A few extra methods are required, e.g. is_true(), because C++ compilers aren't able to differentiate as many contexts as Perl.

Example: Getting The Current Date

The first example just calls the Perl localtime() function to get the current date. The eval() method uses scalar context so localtime() returns a string.

  wPerlScalar t = perl.eval("localtime");
  cout << t.as_string() << '\n';

Example: Trivial Regular Expression Search

The next example uses a Perl regular expression to search the result.

  if (t.find("Apr")) {
    cout << "excellent time of the year!\n";
  }

Example: Prepending Data To A Scalar

Some methods exist because C++ is not able to perform some operations as fast as Perl. Prepending data to a scalar is one example. I'll give the Perl code first:

  my $str = "bar";
  $str = "foo" . $str;

The mechanical translation of this code to libperl++ is:

  wPerlScalar str = "bar";
  str = wPerlScalar("foo").append(str);

C++ creates two temporary values to execute this: one for "foo" and one for the return value from append(). This is quite wasteful. Libperl++ provides a prepend() method to solve the problem:

  wPerlScalar str = "bar";
  str = str.prepend("foo");

Only one temporary, "foo", is created. That temporary can be eliminated by using a wPerlScalar object instead of a char *.

Example: Passing Scalar Arguments To A Perl Subroutine

The last scalar example defines a Perl subroutine to add two numbers and then uses the subroutine in a simple calculation. Automatic type conversion makes it simple to mix perl scalars with C++ data.

  wPerlScalar add = perl.eval("sub { my($x, $y) = @_; $x + $y }");
  double x = (add(3, 2) + add(4, 2)).as_real();

Perl is obviously used in the add() function. Perl is not so obviously used in the + operator as well.

More About Type Conversion

Occasionally the C++ compiler can't figure out how to convert from a C++ built-in type to a perl scalar. This is annoying, but can be fixed by giving the compiler a not-so-subtle hint:

  unsigned short int x = 1, y = 2;
  add(wPerlScalar(x, wPerlScalar::Force_Integer),
      wPerlScalar(y, wPerlScalar::Force_Integer));

That code explicitly tells the compiler to build a perl scalar from the C++ integer values.

The ambiguous conversion problem also pops up when assigning values to wPerlScalar objects. Libperl++ has long, ugly assignment methods that you can use instead of the = operator. For example, instead of:

  wPerlScalar x = 0;
  wPerlScalar y = 1.0;
  x = 2;
  y = 3.0;

you can write:

  wPerlScalar x(0, wPerlScalar::Force_Integer);
  wPerlScalar y(1.0, wPerlScalar::Force_Real);
  x.set_as_integer(2);
  y.set_as_real(3.0);

Generally you will only need to resort to these techniques when you are working with enum, short, char or unsigned integer values.

Type conversion is the most difficult part of libperl++ because it involves C++ overloading features and type conversion operators. Thankfully, these problems are rare.

Working With Arrays

Arrays come in two flavors. Arrays can be either heterogeneous, like Perl's arrays, or homogeneous, like the Standard Template Library. Perl is used to implement both, but the homogeneous array has a bit of extra code wrapped around it to enforce the extra restriction. Most of the Perl array functions, including map, grep and sort, have C++ equivalents.

Here's the basic Perl-like array:

  wPerlArray a;
  a.push(1);
  a.push(2.0);
  a.push("three");
  cout << "a[1] = " << a[1].as_string() << '\n';

The homogeneous STL-like array is similar:

  tPerlArray<int> a;
  a.push(1);
  a.push(2);
  cout << "a[1] = " << *a[1] << '\n';

This array only accepts integer values. The bracket operator returns a pointer to an integer so that a non-existent array element can be indicated with NULL.

The tPerlArray template is completely generic and can be used for any type of data, including C++ objects. It uses a placement constructor to copy values into the perl array and properly destroys values when removing elements.

  tPerlArrayI<int> b;
  b.push(3);
  b.push(4);
  cout << "b[1] = " << b[1] << '\n';

This array is similar to the previous example, but it is optimized to only hold integer-like values. This is a slight performance advantage because the value can fit directly in a perl scalar without needing additional memory. The bracket operator returns the value itself, not a pointer. -1 is used to indicate a non-existent element.

One other variation, tPerlArrayP can be used to hold pointer values. It works exactly like tPerlArrayI, but returns NULL to indicate a non-existent element.

Working With Hashes

Hashes also come in two basic flavors, however, there is also the type of the key to worry about. Libperl++ allows bit strings, strings, integers, and reals as hash keys. The difference between it and Perl is that Perl always converts the value to a string and the C++ wrappers don't. For example, if you use an integer hash key, the key will be the bit string value of the integer itself, not the printed representation of the integer. If the hash you create needs to be accessed from Perl, take care to ensure that the keys are always strings. Most of the Perl hash functions, including keys, values and each, have C++ equivalents.

Here's the basic Perl-like hash:

  wPerlHash a;
  a.set("one", 1);
  a.set("two", 2.0);
  a.set("three", "three");
  cout << "a{two} = " << a.get("two").as_string() << '\n';

The homogeneous STL-like hash is similar:

  tPerlHash<int> a;
  a.set("one", 1);
  a.set("two", 2);
  cout << "a{two} = " << *a.get("two") << '\n';

This template, the tPerlHashI template and the tPerlHashP template have the same differences from wPerlHash as the array templates have from wPerlArray.

Working With Regular Expressions

Regular expressions, or regexps, are one of Perl's nicest features. There are several methods on wPerlScalar, such as find(), that can be used for simple searches and iteration. However, those methods compile a regexp each time they're used. This is very inefficient. A wPerlPattern object can be used to avoid this because it compiles its regexp exactly once. The compiled regexp can then be used over and over. For example, this code creates a regexp that is used several times to find words in an array:

  wPerlPattern word("/\\w+/");
  while (input.is_true()) {
    scalar = input.shift();
    if (scalar.apply_pattern(word)) {
      output.push(scalar);
    }
  }

The pattern object can also use the full range of Perl's regexp operations including transliteration, substitution and all of the associated flags.

Working With Subroutines and Closures

You've already seen a few examples of using subroutines and closures. The first example:

  wPerlScalar getstore = perl.subroutine("getstore");

asks the perl interpreter to fetch a subroutine called getstore from the top level package. If the subroutine doesn't exist, the method returns Perl undef.

Creating an anonymous subroutine is easy too. Here's an example from above:

  wPerlScalar add = perl.eval("sub {"
                                "my($x, $y) = @_;"
                                "$x + $y"
                              "}");

This is my preferred way of formatting a Perl subroutine embedded in C++ code. The C++ standard guarantees consecutive string constants are treated as a single constant.

When using anonymous subroutines, make sure you're using a modern version of perl; versions up to 5.004 had serious bugs and memory leaks.

Libperl++ has special syntax defined so that you can use a perl scalar just like a regular function call:

  wPerlScalar r = add(1, 2);
  int n = add(r, "3").as_integer();

There are some limitations with this syntax though. For one, the Perl subroutine is always called in scalar context. Second, the syntax is pretty rigid. The library only allows up to 10 scalar parameters passed to the subroutine and a single scalar is always returned. If an array or hash is given, it is silently converted to a reference.

  wPerlScalar join = perl.eval("sub {"
                                 "my $sep = shift;"
                                 "join $sep, @_"
                               "}");

  cout << join(", ", 1, 2.0, "three").as_string() << '\n';

  wPerlArray a; a.push(1); a.push(2.0); a.push("three");
  cout << join(", ", a, 4, 5.0, "six").as_string() << '\n';

This produces the output:

  1, 2, three
  ARRAY(0xd5850), 4, 5, six

The last warning about calling subroutines is that prototypes are always ignored. An array is always passed as an array reference and a hash as a hash reference.

Shadow Objects

Wrappers actually come in two flavors. The common one is the one we've already examined in some detail. The other flavor is quite bizarre from a C++ perspective. It is used to shadow Perl variables. Here's how they are normally used:

  wPerlScalarShadow x = perl.scalar("Some::Module::x");
  x = 10;

This code looks up a scalar Perl variable known as $Some::Module::x and shadows it to the C++ variable x. Any assignment to x affects $Some::Module::x and vice versa. They share the same perl scalar value.

Normal wrappers have pass by value semantics whenever they are constructed or copied. For example, when a normal wrapper is passed into a subroutine, a copy of the value is made and it is the copy that the subroutine uses. The shadow wrappers have pass by reference semantics when constructed and pass by value semantics when copied. Using the subroutine example again, when a shadow wrapper is passed to a subroutine, the value is not copied and the subroutine is able to modify the original.

Here is a brief comparison of the two wrapper flavors.

  # Sample Perl code

  sub ModifyArgument {
    $_[0] = 1;
  }

  sub DontModifyArgument {
    my($arg) = @_;
    $arg = 1;
  }

  // Sample C++ code with identical semantics

  void ModifyArgument(wPerlScalarShadow arg) {
    arg = 1;
  }

  void DontModifyArgument(wPerlScalar arg) {
    arg = 1;
  }

This example only serves to help you understand shadow wrappers. You should probably not write code like this. It is better to use standard C++ references, i.e. wPerlScalar &arg, for handling output arguments because that is easier to read and has better performance. The shadow wrappers are necessary for sharing values between C++ and Perl code, but try to avoid them if possible because they are surprising to the casual reader.

Shadow wrappers have the same base name as the common wrapper, but end in the name Shadow. For example, wPerlArrayShadow is the shadow wrapper for shadowing array values.

Performance Considerations

Libperl++ has many features that are easy to use, but also have fairly a heavy performance cost. This section examines a simple, but fairly common, performance problem you might encounter.

At first glance, you may think the following C++ code runs much faster than the equivalent Perl code. You'd be wrong. The Perl code is actually faster. On the Sun Ultra 2, the Perl code runs 10% faster than the C++ code.

   // C++ code                         # Perl code

   wPerlScalar r = 0;                  $r = 0;

   for (int i = 0; i < 100000; ++i)    for ($i = 0; $i < 100_000; ++$i)
   {                                   {
      r = r + i;                         $r = $r + $i;
   }                                   }

The trouble with the C++ code is that it creates a lot more temporary values than the Perl code -- which means a lot more calls to malloc(). The following C++ code runs about 10x faster than the Perl code. (Possibly faster if you have a really good C++ compiler.)

   // C++ code                         # Perl code

   wPerlScalar r = 0;                  $r = 0;

   for (int i = 0; i < 100000; ++i)    for ($i = 0; $i < 100_000; ++$i)
   {                                   {
      r += i;                            $r += $i;
   }                                   }

Of course, the way to get the best performance out of C++ is to avoid using Perl objects directly when regular C++ will work. The following C++ code runs much faster than the Perl code. Perl is no slouch though, so even though the C++ code is faster, you might not notice it in your application's over-all performance. Making a habit of writing code like this won't win any points with future colleagues maintaining your code either.

   // C++ code                         # Perl code

   wPerlScalar r = 0;                  $r = 0;
   int temp_r = r.as_integer();

   for (int i = 0; i < 100000; ++i)    for ($i = 0; $i < 100_000; ++$i)
   {                                   {
      temp_r += i;                        $r += $i;
   }                                   }
   r = temp_r;

The performance advice given here only applies to doing things that either C++ or libperl++ can do quickly, i.e. as direct functions without having to use the Perl interpreter to evaluate. If you are doing a lot of Perl subroutine calls or string evaluation then the C++ code will by definition run only as fast as the equivalent Perl. The other condition to watch out for when tuning your application is when neither your C++ code or the Perl code are the bottlenecks. This frequently happens when doing intense I/O or large memory allocations.

Things To Avoid

Don't use any global static wPerl, wPerlScalar, wPerlArray, etc. objects because static C++ object initialization sequencing is a nightmare. You'll have cases where your scalars are initialized before your interpreter or other horrible things.
If you absolutely need to create static perl objects, you're probably better off just letting libperl++ start a perl interpreter whenever you need one. This is a long way from perfect because you won't be able to control when/how the interpreter gets created, but at least your static constructors will (probably) succeed. There is the static method wPerl::run() that will return (and possibly start) the default running perl interpreter.
Don't subclass the wrapper classes. The wrappers have been carefully written so that they don't impose any performance penalty by using them instead of the internal Perl data structures. To achieve this, they prohibit the use of virtual methods and virtual inheritance and don't use any data members other than the perl value itself. Unfortunately, these optimizations make sub-classing the wrappers much more difficult and error prone. The only reason that you should subclass the wrappers is if you need access to the internal Perl data structures.
Quoted strings can be difficult to send to perl because C++ sees them before Perl does. Using an alternate quote character for your Perl strings helps out a great deal:
```
       perl.eval("print qq(hello, world\\n)");
     
```
is much better than:
```
       perl.eval("print \"hello, world\\n\"");
     
```
The rules for building a Perl expression, including those for building strings, are exactly the same when storing Perl expressions in C++ strings as they are for Perl scripts. However, the C++ compiler might throw a few surprises at you. Take the following code:
```
       perl.eval("print q(\n)");
     
```
If you're reading this code with your brain in Perl mode, you might think it will print a back slash followed by the letter n. Perl would, if that was what it saw. C++ parses the string first and converts all back slash notation before Perl even sees the string. This example actually prints a newline.
Code that uses wPerlShadow objects can be difficult to understand. As a general rule of thumb you should only use shadow objects to shadow a Perl variable that you want to access from both C++ and Perl.
C++ values that are assigned to perl scalars are always copied. It is perfectly fine to create a wPerlScalar from a char [] allocated on the stack, as long as the string has been initialized. The following code has serious problems however:
```
        wPerlScalar str = new char[20];
     
```
When perl creates a new scalar, it uses strlen() to compute the initial length. In this case the string is not nul terminated and the result of strlen() is undefined. Even if the code doesn't crash, a memory leak will occur.

Last Words

libperl++ is still evolving and growing to fit the needs of its users. The basic features have stabilized though, and it is being used to implement several production applications. I'm very interested in hearing feedback from people using the library. If you have any comments, please mail them to me.

Hopefully the safety and simplicity of libperl++ will encourage more programmers to embed perl into their applications. That's good for everybody.