вторник, 25 ноября 2014 г.

Intersystems Caché - Approaching Community Package Manager - Part II

This is the second part of my long post about package managers in various operating systems and language distributions. Now you probably "buy the idea" that convenient package manager and 3rd party code repository is the key factor in establishing of a live and popular ecosystem. In the second part we will discuss the action plan for creating of a package-manager for the Caché​ database environment.

​So let try to estimate how much we should implement if we would try to add some basic package management facilities to Caché ecosystem? Should we do anything in the kernel, or could it be done as external service? What is the minimum functionality necessary at the beginning to establish anything resembling package management repository? Which still be useful?


1st question to answer - what is composing the “package”? What about simplest ever case - when only Caché classes to be deployed? How we keep multiple file types? In the ideal case – some ZIP container could be used, but in the  simplest case even the simple XML file, as a Caché Studio project export could serve the purpose, because even now we could embed all supported file types (CLS, RTN, INC, CSP, ZEN, CSS, GIF, etc.) to such XML export files. Not everything is possible to add to the project using Studio UI, but AFAIK much less restrictions applied if we use the Studio API classes.

Yes XML export is very inefficient, bloated, and although it could handle binary files as base64 encoded, but it would generate large files. For the initial implementation though we could ignore this inefficiency for the moment

Metadata file

2nd question to answer - what should we put as the metadata info? Certainly there should be “dependency information” (to make possible recursive install of all dependent packages), but what else?

Here is the example of metadata information from some abstract CPAN module using ExtUtils::AutoInstall package functions, which is not usually part of distribution, but one has handy facility for dependency tracking:

use inc::Module::Install; name 'Joe-Hacker'; abstract 'Perl Interface to Joe Hacker'; author 'Joe Hacker <joe@hacker.org>'; include 'Module::AutoInstall'; requires 'Module0'; # mandatory modules feature 'Feature1', -default => 0, 'Module2' => '0.1'; auto_install( make_args => '--hello', # option(s) for CPAN::Config force => 1, # pseudo-option to force install do_once => 1, # skip previously failed modules ); WriteAll;

Here is the example of NPM package dependencies description:

{ "dependencies" :
 { "foo" : "1.0.0 - 2.9999.9999"
 , "bar" : ">=1.0.2 <2.1.2"
 , "qux" : "<1.0.0 || >=2.3.1 <2.4.5 || >=2.5.2 <3.0.0"
 , "asd" : "http://asdf.com/asdf.tar.gz"
 , "til" : "~1.2"
 , "elf" : "~1.2.3"
 , "lat" : "latest"
 , "dyl" : "file:../dyl"

As a rough approximation, we could start with JSON format used by the NPM packages. However, for the simplicity sake, before we have better JSON support in the kernel, we could start with XML metadata file information.

Honestly, I hate XML, but let face it - XML and JSON are quite interchangeable, and all about the same, they are 2 different way to serialize hierarchical info. So whatever is described in JSON, could be similarly described in XML, and vice versa.
Once we have better JSON support in the product, we could easily switch the gears, and use JSON instead of XML for metadata information file.

Dependency on system classes

When we are talking about dependencies on some classes there is interesting problem to address - how to mark dependency on some “built-in” Caché/Ensemble/HealthShare/TrakCare class(es), which may have been introduced with some particular version? And in general - how to denote dependency on anything from %CACHELIB and similar system database?

For simplicity matter (in the initial implementation) we may just ignore that problem, and if deployed extra class will reference to anything of system database then just assume it’s just there.

In the ideal case we should have facilities to require dependency on some particular version (i.e. “>2014.1”) of particular product (“Cache”, “Ensemble”, “HealthShare”, “EM”, etc.) or even some particular package installed (“iKnow”, “SetAnalysis”, etc) This is too early though at the moment to try to invent some definitive mechanism, so we may leave this question unanswered.

Cross-platform binary modules

CPAN would not get so much success if there would be no way to distribute packages, which are partially implemented in C, and part in Perl. So for calling to some mission critical, highly optimized code, or as a wrappers for sokme externally available C/C++ library. In Perl they have XS API facilities which allows you to call C code from Perl module, and in reverse. If you would look into the implementation details you would quickly realize that XS is modelled very similar to Caché callout - similarly as in our case, there is no simple and direct wayto call any C/C++ API, you have to write wrapper to call it. But dislike to callout there are available a number of service utilities which simplify the process of a wrapper creation, such as:

  • h2xs preprocessor to generate XS header using the given C header file (well with some limitations);
  • xsubpp - preprocessor to convert XS code to pure C code, etc;

While dealing with callout code from COS we have a little help from the system, and most of the code should be written manually. [Fortunately, now we are allowed to write DLL callouts, and not obliged to statically recompile Caché kernel, the situation I remember at the early 2000]

There are a couple of rather modern, and relatively convenient approaches to call external C/C++ code from Caché kernel:

From the practical prospective though, taking into account multiple Caché platforms we should handle equally well (Windows, Linux, Mac OS X, oreven VMS), and the fact that these both FFI (foreign-function interfaces) are not yet officially supported, I should admit that they both are not ready yet, and could not be recommended as a way to handle deployment of mixed C/COS packages. Now it’s not a big issue, but eventually, once we will go to cross-platform with binary packages we should revisit this topic.

Unit Testing

CPAN example showed us yet another good practice, which may positively affect the maturity and stability of 3rd party eco-system – built-in testing support. Each popular Perl package had built-in set of unit-test, which supposed to be run after compilation completed and before installation happen. If there this unit-testing is not passed for the target system then installation will not happen.

For simplicity sake we may ignore unit-testing support in the 1st implementation, but once it will evolve to the binary package format (i.e. ZIP) and binary modules support added – then testing should become required step before installation.

Command-line access

User experience is a key factor here - if this system would be inconvenient to use then there is big chance to stay unnoticed. To be useful for COS developers community here we supposed to handle “package installations” both directions:

  • be it invoked from COS shell, via simple package-manager shell `do ^CPM`
  • or from command-line, i.e. `cpm install DeepSee-Mobile`
From practical point of view they should be interchangeable and provide the same side-effect for each operation. But having CLI access to package manager is important for administrators due to scripting needs.
In the longer term, once infrastructure is established and mature enough there should be developed GUI wrapper for package manipulations (say, callable from SMP), but GUI is not required at the 1st step.

Mirroring and CDN

In 199x-200x years each package management system faced yet another problem, they had to address separately - how to make their repository respond fast, and preferably from geo-optimized mirror location? And while we are at this topic - what about that mirror system should be DDoS resistant at the same time? Such “old school” software repositories usually relied on community power to deploy huge network of geo-spread mirrors (CPAN, CTAN, Debian, etc.). They are still using the same approach, and still have multiple mirrors spread over the planet, but today we have easier soluion to this same problem.

Today there is available a cheap facility of CDN providers. If we need to just host some set of static binary files then CDN is just “that doctor ordered”. I have no idea who is the best selection for our scenario: whether it will be some generic VM-hosting provider like Amazon or Azure, or, may be, we would need to select between Amazon CloudFront, or MaxCDN or anything similar. Anything of mentioned is easy to handle nowadays, and not require any extra mirroring effort from the community.

If you have any prior experience with CDN, and have strong preference on something, please provide us advice - we will be curios to know any details.

Final words

This is my simple take on the apparent problem of missing convenient repository for 3rd party components used in Caché database problem. They are either hard to find, or hard to install, or unmaintained, or all at once. We need more utilities, more classes available, and more developers involved in the ecosystem. Central repository like CPAN could be a trigger point in changing the scenario of how an average Joe "the COS developer" develops their new solutions.

I hope it's clear now that package manager might be doable right now, even with current database platform support, and could be done in reasonable amount of time. So...

So who is wanting to participate? Do we have community demand here?

Комментариев нет :

Отправить комментарий