вторник, 25 ноября 2014 г.

Intersystems Caché - Approaching Community Package Manager - Part I

​ Here is the problem as I see in the Caché developers community - if you are newbie COS/Ensemble/DeepSee developer then it's very hard to find any suitable 3rd party component, library or utility. They are spread over the Internet, some of them are in the GitHub, some of them in SourceForce, rare ones are on their own sites, etc. Yes,there are many useful components or tools (there are even standalone debugger or VCS) but it takes some time to discover all the useful location, and get to used to this situation.
There is no single location and there is no convenient way to find and install extension or utility.
Which is not very competitive comparing to other languages and development environment. In these articles I will try to investigate this problem further (this part I), and will propose some simple decision (part II)

​What do you think about "package manager" and its importance for the community success? For me, package manager is absolutely a must, and most important ingredient for the language success and maturity of eco-system. You could not find any single language community where there would be no convenient package manager with huge collection of available 3rd party packages. After all these years spent hacking in Perl, Python, JavaScript, Ruby, Haskell, Java, (you name other) you pretty much used to expect that when you start a new project you have a plenty of external components which may help you to cook the project fast and seamless. `PM install this`, `PM install that` and quite easy and fast you get something working and useable. Community works for you.

These words about CPAN and its experience are pretty much characterizing the importance of the precedent of CPAN and it's later impact toother languages and environments:

"Experienced Perl programmers often comment that half of Perl's power is in the CPAN. It has been called Perl's killer app.Though the TeX typesetting language has an equivalent, the CTAN (and in fact the CPAN's name is based on the CTAN), few languages have an exhaustive central repository for libraries. The PHP language has PECL and PEAR Python has a PyPI (Python Package Index) repository, Ruby has RubyGems R has CRAN Node.js has npm Lua has LuaRocks Haskell has Hackage and an associated installer/make clone cabal but none of these are as large as the CPAN. Recently, Common Lisp has a de facto CPAN-like system - the Quicklisp repositories. Other major languages, such as Java and C++, have nothing similar to the CPAN (though for Java there is central Maven).
The CPAN has grown so large and comprehensive over the years that Perl users are known to express surprise when they start to encounter topics for which a CPAN module doesn't exist already."



OS X (pkg)


Application Extensions

Application extensions


Source redacted - https://en.wikipedia.org/wiki/Package_manager

Here you see the redacted (significantly simplified) picture of WikiPedia article about package managers. They are a lot, be them source-based, or binary-based ones, particular architecture specific, OS-specific or cross-platform. I will try to cover them to some degree below. I will mix to the picture some language-specific package-management facilities, because without them picture would be of no much usefulness. And we are talking about language-/development-platform specific package managers in any case...

One important observation you could get from the table above – the more popular particular operating system and API ecosystem is, the more chances you have to receive multiple concurrent package managers for this platform. See situation in the Linux, Windows or Mac OS X as good examples. The more package managers used out there, the faster ecosystem is evolving. Having multiple package managers is not a requisite for fast ecosystems development pace, but rather side-effect of one.
Simply putting - if we eventually would get the situation where we would have several package managers with different repositories, than that would be rather indication about good ecosystem state, not bad.

Simplified timeline

Here is the approach I will use: get most important operating systems and language package managers, put them to the timeline, explain their specifics and interest for us, then do some generalizations and conclusions for Caché as a platform.

As we all know "picture worth a thousand word" so trying to simplify explanation I have drawn this silly timeline, which references all "important" (from my personal point of view) package managers which were used till the moment. Upper part is about language-specific package managers, and lower part is about operating system/distribution specific ones. X-axis steps is by 2 years (from January 1992 until today).

Package managers: Timeline from 1992 till 2014


90ties were years of source-based package managers. We already had internet working as a distribution manager, but all package managers were operating on the same scenario:

  • Given the requested package name PM has downloaded the resolved tar-file;
  • Extracted it locally to the user/site specific area;
  • And invoked some predefined script for "building" and installing those sources to the locally installed distribution.

CTAN was the 1st language distribution which established such handy practice to install contributed content (Tex packages and extensions) from central repository. However, real explosion to this model happened when Perl community started to employ this model – since the moment of CPAN inception in the 1995 it collected "140,712 Perl modules in 30,670 distributions, written by 11,811 authors, mirrored on 251 servers."

This is very comfortable to work with language and within environment where for each next task you have the 1st question you ask: "Whether there is already module created for XXX?" and only in a couple of seconds (well minutes, taking into consideration the internet speed in middle-late nineties) after the single command executed, say:

>cpan install HTTP::Proxy

You have this module downloaded, source extracted, makefile generated, module recompiled using this generated makefile, all tests ran, sources, binaries and documentation installed into the local distribution and all is ready to use in your Perl environment via simple "use HTTP::Proxy;"!

Worth to mention that most of CPAN modules are Perl-only packages (i.e. beyond Perl Makefile.pl there are only source files written in Perl, thus not extra processing is necessary, which is simplifying deployment). But also worth to note that Perl Makefile.pl is flexible enough to easily handle combination of Perl sources with some binary modules (e.g. program, which is usually written in C, which in turn to be downloaded and compiled locally, using target specific compiler and its ABI).

The same model used by Tex in CTAN, and Perl in CPAN, developers of statistical language R tried to use in CRAN. The similar repository (archive) of all available sources, and similar infrastructure for easy download and install. The problem with CRAN was the language (R) which was not very famous and so widespread as Tex or Perl. But regardless this fact even "relatively rarely used R" accumulated 6000+ packages of extensions.

BSD world: FreeBSD Ports, NetBSD pkgsrc, and Darwin Ports

At the same period in the middle of 90-ies, FreeBSD introduced their own way to distribute open-source software via their own "ports collection". Various BSD-derivatives (like OpenBSD and NetBSD) maintained their own ports collections, with few changes in the build procedures, or interfaces supported. But in any case the basic mechanism was the same after `cd /port/location; make install` invoked:

  • Sources were installed from appropriate media (be it CD-ROM, DVD or internet site);
  • Product built using the Makefile given and compiler(s) available;
  • And build targets installed according to the rules written in the Makefile or other package definition file;

There was an option to handle all dependencies of a given port if there was request, so full installation for bigger package could still be initiated via single command and package manager handled all the recursive dependencies appropriately.

From the license and their predecessors prospective I consider Darwin Ports/MacPorts as the derivative of this BSD port collection idea – we still have the collection of open source software, which is conveniently handled by a single command, i.e.:

$ sudo port install apache2
It's worth to emphasize – until the moment both language-based repositories (CTAN/CPAN/CRAN) and BSD port collections (FreeBSD/OpenBSD/NetBSD/MacPorts) were all representing the 1st class of package-managers - sourcecode-based package managers

Linux: Debian and Red Hat

Sourcecode-based package management model was working well (till some degree) and produced impression of full transparency and full control. There were only several "small" problems:

  • Not all software could be deployed in source form, there is real life beyond open-source software, and proprietary software still need to be deployed conveniently;
  • And building of a big project may took a huge chunk of time(hours).

There was apparently a need to establish a way to distribute packages (and all dependencies) in their binary form, already compiled for the given architecture and ready for consumption. So we introduce binary package formats, and the 1st which of some interest for us – is the .deb format use by Debian package manager (dpkg). Original format, introduced in the Debian 0.93 in the March 1993, was just the tar.gz wrapper with some magic ASCII prefixes. Currently .deb package is both simpler and more complex – it's just the AR archive consisting of 3 files (debian-binary with version, control.tar.gz with metadata and data.tar.* with the installed files). You are rarely using dpkg in the practice - most current Debian-based distributives are using APT (advanced packaging tool). Surprisingly (at least for me for the moment I started to write this review) APT has outgrown Debian distros, and has been ported to Red Hat based distros (APT-RPM), or Mac OS X (Fink), or even Solaris.

"Apt can be considered a front-end to dpkg, friendlier than the older dselect front-end. While dpkg performs actions on individual packages, apt tools manage relations (especially dependencies) between them, as well as sourcing and management of higher-level versioning decisions (release tracking and version pinning)."


The apt-get' reach functionality and easiness has influenced all later package managers.

Another good example for binary packaging systems is the RPM (Red Hat Package Manager). RPM introduced with Red Hat V2.0 the late 1995. Red Hat quickly became the most popular Linux distribution (and solid RPM features was one of the factors winning competition here, till some moment at least). So it is not a big surprise that RPM started to be used by all RedHat-based distributions (e.g. Mandriva, ASPLinux, SUSE, Fedora or CentOS), but even further, beyond Linux it was also used by Novell Netware, or IBM AIX.

Similar to APT/dpkg there is Yum wrapper for RPM packages, which is frequently used by end-users, and which provides similar high-level services like dependency tracking or building/versioning.

Mobile software stores: iOS App Store, Android Market/Google Play

Since the introduction of Apple iOS App Store, and later Google Android Market, we have received most probably most popular software repositories which we have seen to date. They are essentially OS specific package managers with extra API for online purchases. This is not yet an issue for App Store, but is an issue for Android Market / Google Play – there are multiple hardware architectures used by Android devices (ARM, X86 and MIPS at the moment), so there are some extra care should be done before customer could download and install binary package containing executable code for some application. Given hardware agnostic Java code, you either supposed to compile Java-to-native binary upon installation on the target device, or repository itself could take care about this and recompile the code (with full optimizatons enabled) on the cloud, before downloading to the customer device.

In any case, regardless of where and how such optimizing native adaptation is done, this part of installation process is considered a major part of software packaging services done by operating system. If software supposed to be running on many hardware architectures, and if we are not deploying software in the source-code form (as we done in BSD or Linux) then repository and package maneeger should handle this problem transparently and in an some efficient manner.

For a time being, I'm not considering any cross-platform issues, and will handle in the 1st implementation only fully open-source packages. We may return back to this question later, to resolve both cross-version and cross-architecture issues simulateneously.

Windows applications: Chocolatey Nuget

It was a long-standing missing feature - despite the popularity of Windows on the market, we didn't have any central repository, as convenient as for Debian is apt-get, where we could find and install any (many/most) of available applications. There used to be Windows Store for Windows Metro applications (but nobody wanted to use them :) ), even before then, there used to be nice and convenient NuGet package manager, installed as plugin to Visual Studio, but (the impression was that) it was only serving .NET packages, and was not targeting "generic Windows desktop applications". Even farther, there was Cygwin repository, where you could download (quite conveniently though) all Cygwin applications (from bash, to gcc, to git, or X-Window). But this was, once again, not about "any generic windows application", but only about ported POSIX (Linux, BSD, and other UNIX compatible APIs) applications which could be recompiled using Cygwin API.

That's why development of Chocolatey Nuget in 2012 got me as a nice surprise: apparently having NuGet as a basis for package manager, with added PowerShell woodoo upon installation, and with added some central repository here you could pretty much have the same convenience level as with apt-get in Linux. Everything could be deployed/wrapped as some Chocolatey package, from Office 365, to Atom editor, or Tortoise Git , or even Visual Studio 2013 Ultimate! This quickly became the best friend of Windows IT administrator, and many extra tools used Chocolatey as their low-level basis have been developed, best example of such is - BoxStarter, the easiest and fastest way to install Windows software to the fresh Windows installations.

Chocolatey shows nothing new, which we didn't see before in other operating systems, it just shows that having proper basis (NuGet as a package manager, PowerShell for post-processing, + capable central repository) one could built generic package manager which will attract attention quite fast, even for the operating system where it was unusual. BTW, worth to mention that Microsoft decided to jump to the ship, and are now using Chocoatey as one of repositories, which will be available in their own OneGet package manager to be available since Windows 10.

On a personal note, I should admit, I do not like OneGet as much as I like Chocolatey - there is too much PowerShell scripting I'd need to plumbing for OneGet. And from user experience prospective Chocolatey hides all these details, and is looking much, much easier to use.

Node.js NPM

There are multiple factors which have led to recent dramatical success of JavaScript as a server-side language. And one of most important factors in this success (at least IMVHO) - the availability of central Node.js modules repository - NPM (Node Package Manager) . NPM is bundled with Node.js distribution since version 0.6.3 (November 2011).

NPM is modeled similarly as CPAN: you have a wrapper, which from command-line connects to central repository, search for requested module, download it, parse package metainfo, and if there are external dependencies then process this recursively. In a few moments, you have working binaries and sources available for local usage:

C:\Users\Timur\Downloads>npm install -g less
npm http GET https://registry.npmjs.org/less
npm http 304 https://registry.npmjs.org/less
npm http GET https://registry.npmjs.org/graceful-fs
npm http GET https://registry.npmjs.org/mime
npm http GET https://registry.npmjs.org/request
npm http GET https://registry.npmjs.org/isarray/-/isarray-0.0.1.tgz
npm http 200 https://registry.npmjs.org/isarray/-/isarray-0.0.1.tgz
npm http 200 https://registry.npmjs.org/asn1
npm http GET https://registry.npmjs.org/asn1/-/asn1-0.1.11.tgz
npm http 200 https://registry.npmjs.org/asn1/-/asn1-0.1.11.tgz
C:\Users\Timur\AppData\Roaming\npm\lessc -> C:\Users\Timur\AppData\Roaming\npm\node_modules\less\bin\lessc
less@2.0.0 C:\Users\Timur\AppData\Roaming\npm\node_modules\less
├── mime@1.2.11
├── graceful-fs@3.0.4
├── promise@6.0.1 (asap@1.0.0)
├── source-map@0.1.40 (amdefine@0.1.0)
├── mkdirp@0.5.0 (minimist@0.0.8)
└── request@2.47.0 (caseless@0.6.0, forever-agent@0.5.2, aws-sign2@0.5.0, json-stringify-safe@5.0.0, tunnel-agent@0.4.0, stringstream@0.0.4, oauth-sign@0.4.0, node-uuid@1.4.1, mime-types@1.0.2, qs@2.3.2, form-data @0.1.4, tough-cookie@0.12.1, hawk@1.1.1, combined-stream@0.0.7, bl@0.9.3, http-signature@0.10.0)

Wortth to note that changes which NPM authors introduce to the package manager practices - they use JSON format for package metainformation, instead of Perl-based ones used in the CPAN.

To be continued…

Ok, enough about current practice used elsewhere, in the 2nd part of this article we will talk more in details about simple proposal for a package manager in Caché. Stay tuned!.

Комментариев нет :

Отправить комментарий