Quantcast
Channel: Pedro Alves on Business Intelligence
Viewing all 186 articles
Browse latest View live

Seeing PDI lineage information

$
0
0


Since Pentaho 6.0 there's support for lineage in PDI. From my announcement - Keeping on the theme of governance and compliance, organizations not only need to know what etl and transformations they have, they need to know exactly what is happening at any point in time. What files, what parameters, when, by who, etc, etc.

So now you can tell pdi to capture that runtime information (core CE capability) as well as consume it through the new MetaIntegration bridge (EE) so that you can visualize it alongside with your broader lineage context

And how to enable that? Here's all the info you need:

The first section comes directly from help.pentaho.com's lineage page

Setting up Data Lineage

Pentaho now offers you the ability to visualize the end-to-end flow of your data across PDI transformations and jobs, providing you with valuable insights to help you maintain meaningful data. This ability to track your data from source systems to target applications allows you take advantage of third-party tools, such as Meta Integration Technology (MITI) and yEd, to track and view specific data.
Once lineage tracking is enabled, PDI will generate a .graphml file every time you run a transformation. You can then open this file using a third-party tool, such as yEd, to view a tree diagram of the data. By parsing through and teasing out the different parts of the graph, you can gain an end-to-end view into a specific element of data from origin to target. This ability can aid you in both data lineage and impact analysis:
  • Data lineage provides the ability to discover the origins of an element of data and describes the sequence of jobs and transformations which have occurred up to the point of the request for the lineage information.
  • Impact analysis is the reverse flow of information which can be used to trace the use and consumption of a data item, typically for the purpose of managing change or assessing and auditing access.

Sample Use Cases

Data lineage and impact analysis can be applicable in several ways.
As an ETL Developer:
  • There are changes in my source system, such as fields which are added, deleted and renamed. What parts of my ETL processes need to adapt? (Impact Analysis)
  • I need additional information in my target system, such as for reports. What sources are can provide this additional information? (Data Lineage)
As a Data Steward:
  • There is a need for auditability and transparency to determine where data is coming from. A global, company-wide, metadata repository needs data lineage information from different systems and applications, i.e. very fine-grained metadata.
  • What elements (fields, tables, etc.) in my ETL processes are never used? How many times is a specific element used in some or all of my ETL processes?
As a Report/Business User:
  • Is my data accurate?
  • I want to find reports which include specific information from a source, such as a field. This process is "data discovery." For example, are there any data sources which include sales and gender? Are there any reports which include sales and zip codes?
As a Troubleshooting Operator:
  • The numbers in the report are wrong (or supposed to be wrong). What processes (transformations, jobs) are involved to help me determine where these numbers are coming from?
  • A job or transformation did not finish successfully. What target tables and fields are affected which are used in the reports?
As an Administrator:
  • For documentation and auditing purposes, I want to have a report on external sources and target fields, tables, and databases of my ETL processes. I need the data for a specific date and version.
  • To ensure compliance, I want to validate naming conventions of artifacts (fields, tables, etc.)
  • For integration into third-party data lineage tools, I want a flexible way of exporting the collected data lineage information.

Architecture

Pentaho's data lineage capabilities allow us to take advantage of tools from Metadata Integration Technology (MITI). If you use a lot of different systems and applications, you can track and visualize specific data across these systems using Pentaho' lineage capabilities and third party tools such as MITI and yEd.
 LineagePPTGraphCropped.png

Setup

Modify …\system\karaf\etc\pentaho.metaverse.cfg (Client & DI-Server when needed):
  • You need to enable lineage explicitly by setting lineage.execution.runtime = on
  • Modify the default folder for lineage GraphML files accordingly: lineage.execution.output.folder=./pentaho-lineage-output
  • Set lineage.execution.generation.strategy=latest (by default)
After the execution of a job or transformation, the GraphML files are generated in the defined folder.

API

It is also possible to access the GraphML information via a DI-Server API. There are REST endpoints available to retrieve the lineage related artifacts.
Below are some example curl commands which exercise the REST endpoints available on the DI Server. These calls use basic authentication. For more information on the various ways to authenticate with a BA or DI server, see the "Authenticate with the Server Before Making Service Calls" topic on the Pentaho Documentation site.
For more detailed information about the REST endpoints available, you can got to the Pentaho Wiki to view the attached Enunciate file.

Get all lineage related artifacts

curl --header "Authorization: Basic YWRtaW46cGFzc3dvcmQ="http://localhost:9080/pentaho-di/osgi/cxf/lineage/api/download/all
Get all lineage from a given date forward

curl --header "Authorization: Basic YWRtaW46cGFzc3dvcmQ="http://localhost:9080/pentaho-di/osgi/cxf/lineage/api/download/all/20150706
Get all lineage between 2 dates

curl --header "Authorization: Basic YWRtaW46cGFzc3dvcmQ="http://localhost:9080/pentaho-di/osgi/cxf/lineage/api/download/all/20150101/20150706
Get all of the lineage artifacts for a specific file in the DI repo

curl --request POST --header "Content-Type: application/json" --header "Authorization: Basic YWRtaW46cGFzc3dvcmQ=" --data '{"path": "/LOCAL DI REPO/home/admin/dataGrid-dummy"}'http://localhost:9080/pentaho-di/osgi/cxf/lineage/api/download/file
Get all lineage related artifacts for a specific file in the DI repo between 2 dates
curl --request POST --header "Content-Type: application/json" --header "Authorization: Basic YWRtaW46cGFzc3dvcmQ=" --data '{"path": "/LOCAL DI REPO/home/admin/dataGrid-dummy"}'http://localhost:9080/pentaho-di/osgi/cxf/lineage/api/download/file/20150701/20150707
Invalid date request

curl --header "Authorization: Basic YWRtaW46cGFzc3dvcmQ="http://localhost:9080/pentaho-di/osgi/cxf/lineage/api/download/all/20159999

Setting up yEd to consume the generated data

The above tutorial shows how to generate the data. Next is the consuming part. EE customers can leverage the MetaIntegration bridge, but everyone can also use yEd to read that info.

There are some steps and configurations needed to do that, though - or else you'll be puzzled to just see a single box, which is clearly... hum... less than ideal :p

yEd is a graph visualization tool that allows you to build or render graphs fron .graphml files.
The .cnfx files in this folder are property mapper configurations that enable consistent rendering of Pentaho metaverse graphs.
To install the properties configuration:
  1. Download this .cnfx configuration file
  2. Open yEd
  3. Navigate to the Edit... menu, Properties Mapper ... 3a. If you are just trying to update existing configurations, remove the old ones first. Then...
  4. Click the import icon on the top left panel of the dialog to import the properties configuration to your yEd installation.
To use the configuration:
  1. Open your .graphml file in yEd.
  2. Navigate to the Edit... menu, Properties Mapper ...
  3. Select the configuration on the left, and click the Apply button at the bottom right of the dialog to apply the selected configuration formatting. You will need to do this once for each configuration you would like to apply.
And you should be done.

Taking it further

In order for this lineage information to be better and better, we encourage the plugin contributors to also contemplate the lineage information on their submissions


Cheers

-pedro


Running Pentaho 6 in Mac OS X 10.11 El Capitan

$
0
0

The problem - and the fix

When El Capitan was released we immediately noticed problems running parts of the Pentaho stack.

PDI

PDI-14470 tracked this issue. Spoon doesn't even start, which is... let's say... somewhat inconvenient. This wasn't actually an issue with pentaho, but with SWT itself, and a lot more people were affected.

Our devs got a lot of help from the community (thanks André, Scott and Christian, among others). In the end, what you want to know is this (courtesy of MDamour):
Remove (or move away): data-integration/lib/pentaho-xul-swt-6.0.0.0-353.jar
Replace it with the file (pentaho-xul-swt-EXPERIMENTAL.jar).
We have tested all of the reported scenarios above: spoon won't start, can't explore repository, can't edit database connection, etc. Please report any additional findings.
Our EE customers will have this already available in the next service patch.

As you may understand, all our focus was on getting this to work on Pentaho 6.0. But we've heard comments of users that did exactly the same on 5.x versions and reported it to work as well. Use at your own risk.

If you still find any issues, please add comments to the case PDI-14470 or create a new one.

Platform installer

A new feature of El Captain, System Integrity Protection or SIP also broke our installers. Simply put, postgresql couldn't be added to the init scripts. BISERVER-12894 tracks this issue.

This has been successfully resolved now, and as soon as we do the last set of tests we'll be updating the download bundles.

Going forward

You may be asking yourselves - "How could this get them by surprise?". Well, don't worry, we ask ourselves the same question... The truth here is that we internally tested a beta version back in July and everything worked great; all this issues appeared after.

But regardless - we _have_ to get better!


-pedro





New Ctools releases - 15.10.26

$
0
0


Huge release here! Team's been very very busy.
You'll see a bigger report than usual cause we had to rework the way we maintain the 5.x and 6.x plugins.

Release Notes - Community Dashboard Framework - Version 15.10.26

Bug

  • [CDF-308] - CGG - update test cases
  • [CDF-349] - Dashboards.removeComponent does not remove components with the same name, if an object is passed.
  • [CDF-351] - Dashboards.addComponents does not check for components with the same name
  • [CDF-578] - Running Schedule Prpt require sample and selecting a recurrence does not add input fields
  • [CDF-582] - [UX] Add a waiting.gif animation to FilterComponent
  • [CDF-588] - Sample Map Dashboard returns error when clicking in a bar chart
  • [CDF-589] - Incorrect file name for sample MetaLayer Home Dashboard
  • [CDF-591] - When updating a table component, with an open expanded row, the current expand is always empty
  • [CDF-592] - CDF views services don't check if CDF is initialized
  • [CDF-593] - CPF init requires that config.properties in repo be publicly readable
  • [CDF-597] - FilterComponent - search input behaviour when there's no match
  • [CDF-598] - FilterComponent deletes its placeholder when it is updated from an "unavailable" state to a state with data
  • [CDF-600] - Filter Component: 'Show "Only" Button' property is not working properly
  • [CDF-604] - Inside callback formatters, Template Component, this is related window and Utils is not available.
  • [CDF-608] - cdf/dashboard/Utils#escapeHtml alows XSS and is dangerous
  • [CDF-609] - Filter Component only allows server-side searching if pagination is also enabled
  • [CDF-614] - In the ButtonComponent, the successCallback (and failureCallback) are not receiving any query result
  • [CDF-617] - use common-ui's RequireJS loader plugins when available
  • [CDF-619] - CDE xaction components doesn't work
  • [CDF-625] - CDF Tutorial - The sample page of "Table + Sparkline" is always loading because the error "'wcdfSettings' is undefined "
  • [CDF-626] - autocomplete does not update associated parameter when selection is made
  • [CDF-628] - TableComponent sample is not render with bootstrap style
  • [CDF-629] - Map component does not pan when we try to drag over a map shape
  • [CDF-634] - inconsistent "/export" endpoint usage
  • [CDF-641] - Using FilterComponent with Values array instead of a Datasource logs an error.
  • [CDF-642] - “not well-formed” warning when loading client-side JSON in Firefox
  • [CDF-644] - On require dashboards, when embedding a dashboard, if the dashboard name has a space, the dashboard does not load properly
  • [CDF-649] - Can't use parameters on Schedule Prpt Component
  • [CDF-651] - Schedule Prpt sample makes reference to showParameters property
  • [CDF-652] - Using a single selector with a pre-determined value, if the parameter is an integer it returns null
  • [CDF-653] - ButtonComponent will no longer work without a label
  • [CDF-654] - DataBar Addin liquid default broke some expected behavior
  • [CDF-655] - Backport fix BACKLOG-5349 to legacy cdf js
  • [CDF-660] - Template addin is having problems applying formatters
  • [CDF-661] - blockG selector in Filter Component CSS became too generic
  • [CDF-662] - Unable to use a previously saved dashboard view object
  • [CDF-663] - FilterComponent uses the search pattern always as lowercase
  • [CDF-665] - Filter Component - "Only" button in SingleSelection filter has incorrect behavior
  • [CDF-666] - Filter Component - "Only" button behavior in limited MultiSelection filter
  • [CDF-671] - Incorrect behavior on Search bar of a Filter Component
  • [CDF-676] - Duplicated text and some grammatical mistakes in FilterComponent samples
  • [CDE-570] - In sample FilterComponent (filter_visual_guide) on Multiple selections the world Only is over the numbers
  • [CDE-326] - MultiButton component: if the same button is clicked N time, processChange is called N times
  • [CDE-620] - Minification is breaking Filter Component accordion
  • [CDE-615] - ButtonComponent doesn't show label
  • [CDE-608] - Unable to use custom components from external plugins in embedded scenarios
  • [CDE-582] - Table component does not respect lifecycle.silent tag and still runs the block function
  • [BISERVER-12645] - Update Your BA Server Steps Break Analyzer in 5.4-GA

Improvement

  • [CDF-572] - As a dashboard developer, in an embedding context, I want to have a way to normalize an html id inside each dashboard.
  • [CDF-583] - Refactor CDF's FilterComponent 'templates.js' to break it down into a set of moustache template files
  • [CDF-594] - CPF config should have an option enable/disable starting the local OrientDB instance
  • [CDF-602] - Filter Component - General update to selector styles
  • [CDF-611] - As a developer, I would like the Action Component to show a blockUI while executing an action, so that the user is notified that some server-side action is taking place
  • [CDF-613] - The implementation of the ButtonComponent should be refactored to avoid duplicated code
  • [CDF-615] - Allow Databar addin alignment
  • [CDF-623] - promote datasources to dashboard "first class citizens"
  • [CDF-639] - Would like to be possible to use the same addin, on the template component, with different options on the same template, passing an id when setting when calling the addin.
  • [CDF-659] - Revamp to the template addin: exposing a function for model parsing and also allow to customize messages as part of the options.

New Feature

  • [CDF-584] - Create CDF Samples for FilterComponent
  • [CDF-637] - As a web developer I want to be able to have more addins on the template component 

Release Notes - Community Chart Components - Version 15.10.26

  • [CDF-620] - CCC - Error is thrown when hovering over a legend marker with plot2 after hiding a series
  • [ANALYZER-3116] - Visualizations - Column-Line Combo - Incorrect Color of Interpolated Points

Release Notes - Community Data Access - Version 15.10.26

Bug

  • [CDA-65] - Compound Join Query breaks when one of the queries has a column with null values
  • [CDA-151] - After Updating C-Tools Using the Install Script non-admin users are unable run Saiku Ad-Hoc reports.
  • [CDA-156] - Exporting does not work for some samples
  • [CDA-161] - Adding value to "Column" property only triggers cache refresh if there's an "Output Index" 

Release Notes - Community Dashboard Editor - Version 15.10.26

Bug

  • [CDE-142] - CDE crashes if it does not find the .html template defined in a .wcdf file
  • [CDE-326] - MultiButton component: if the same button is clicked N time, processChange is called N times
  • [CDE-544] - Saving a CDE dashboard with filename I-[~!@#$%^&*(){}|.,]-=_+|;'"?<>~` and a normal title fails with 404 Page not found
  • [CDE-561] - Required dashboards fail to work with components that depend on DOM objects other than htmlObject
  • [CDE-570] - In sample FilterComponent (filter_visual_guide) on Multiple selections the world Only is over the numbers
  • [CDE-571] - IE 11 : Options Cancel and OK disable when adding HTML code
  • [CDE-576] - IE10/11: The icon Drag and Drop in layout structure is align top
  • [CDE-577] - IE8/IE9: Clicking New in CDE editor throw a new page with error
  • [CDE-578] - webservice "olap/getCubes" not fetching latest cube
  • [CDE-582] - Table component does not respect lifecycle.silent tag and still runs the block function
  • [CDE-584] - CDE renders incomplete AMD module ids for JS/CSS resources with an empty "name" property
  • [CDE-591] - Filter Component refers an undefined property - inputParameter
  • [CDE-594] - It is only possible to load resources in the same folder
  • [CDE-595] - Applying templates to a bootstrap dashboard introduces blueprint elements
  • [CDE-598] - The sample Map Component Reference with require-js is broken (probably missing a resource)
  • [CDE-599] - Dashboards require with a Date Parameter do not render
  • [CDE-600] - The documentation of the filter component does not explicitly explain that the selected items are written to the parameter as an array of IDs
  • [CDE-604] - JS File Resources with no name break the resource order
  • [CDE-608] - Unable to use custom components from external plugins in embedded scenarios
  • [CDE-611] - Can't close error popup
  • [CDE-613] - As a dashboard developer using the NewMapComponent, I would like to react to a change on the zoom level or map position
  • [CDE-614] - As a dashboard developer using the NewMapComponent, I would like to create addIns that provide shape definitions
  • [CDE-615] - ButtonComponent doesn't show label
  • [CDE-616] - As a developer using the NewMapComponent, I would like to use the GeoJSON format internally
  • [CDE-617] - Extension Points popup shows wrong value
  • [CDE-618] - Custom Params don't work on a required [Requirejs] dashboard
  • [CDE-620] - Minification is breaking Filter Component accordion
  • [CDE-622] - Adding lots of parameters to a datasource results in bad formatting
  • [CDE-623] - Table is not assuming Bootstrap as Style
  • [CDE-624] - MQL query example misleads the dashboard developer.
  • [CDE-626] - unable to close "OLAP MDX query" popup
  • [CDE-627] - Changing renderer does not prevent cached version of dashboard to be loaded
  • [CDE-630] - Using tab key when popup is opened allows user to edit fields beneath the popup
  • [CDE-634] - using "save as" on a requireJS supported dashboard saves the new dashboard (wcdf settings) as a legacy dashboard
  • [CDE-637] - DashboardComponent will not map parameters in embedded scenarios
  • [CDE-640] - DashboardComponent can't map parameters in Firefox
  • [CDE-641] - The ButtonComponent does not execute its datasource when clicked
  • [CDE-646] - Can't open type selector in an datasource parameters popup
  • [CDE-650] - validation popup doesn't close
  • [CDE-651] - Post Execution is running before map renderisation is concluded
  • [CDE-652] - resources endpoint doesn't respect downloadable-formats whitelist
  • [CDE-659] - CDE save new dashboard: "Description" field isn't being considered
  • [CDF-629] - Map component does not pan when we try to drag over a map shape
  • [CDF-676] - Duplicated text and some grammatical mistakes in FilterComponent samples


    Improvement

  • [CDE-553] - Review Pop ups in CDE
  • [CDE-556] - CDE should expose default parameter for CDA cacheKeys
  • [CDE-558] - Resultset created by postChange() should be persisted
  • [CDE-579] - save CDA queries using CDATA sections
  • [CDE-589] - Avoid production use of the global `dashboard` variable
  • [CDE-590] - CDE Parameters should have a Public/Private property
  • [CDE-621] - The Dashboard component should allow swapping datasources
  • [CDE-643] - Add def, pv and pvc to the default AMD dependency list when rendering CDE dashboards
  • [CDE-648] - Setting shape opacity and stroke in NewMapComponent
  • [CDE-649] - As an admin I would like to define users/roles with a permission to create/view CDE reports
  • [CDF-579] - Integrate new Selector component into CDF
  • [CDF-637] - As a web developer I want to be able to have more addins on the template component 

Release Notes - Community Graphics Generator - Version 15.10.26

Improvement

  • Upgraded to last CCC release

Release Notes - Community Dashboard Framework - Version 6.0-15.10.26

Bug

  • [CDF-349] - Dashboards.removeComponent does not remove components with the same name, if an object is passed.
  • [CDF-351] - Dashboards.addComponents does not check for components with the same name
  • [CDF-578] - Running Schedule Prpt require sample and selecting a recurrence does not add input fields
  • [CDF-597] - FilterComponent - search input behaviour when there's no match
  • [CDF-609] - Filter Component only allows server-side searching if pagination is also enabled
  • [CDF-626] - autocomplete does not update associated parameter when selection is made
  • [CDF-629] - Map component does not pan when we try to drag over a map shape
  • [CDF-641] - Using FilterComponent with Values array instead of a Datasource logs an error.
  • [CDF-644] - On require dashboards, when embedding a dashboard, if the dashboard name has a space, the dashboard does not load properly
  • [CDF-649] - Can't use parameters on Schedule Prpt Component
  • [CDF-651] - Schedule Prpt sample makes reference to showParameters property
  • [CDF-652] - Using a single selector with a pre-determined value, if the parameter is an integer it returns null
  • [CDF-654] - DataBar Addin liquid default broke some expected behavior
  • [CDF-655] - Backport fix BACKLOG-5349 to legacy cdf js
  • [CDF-660] - Template addin is having problems applying formatters
  • [CDF-661] - blockG selector in Filter Component CSS became too generic
  • [CDF-662] - Unable to use a previously saved dashboard view object
  • [CDF-663] - FilterComponent uses the search pattern always as lowercase
  • [CDF-665] - Filter Component - "Only" button in SingleSelection filter has incorrect behavior
  • [CDF-666] - Filter Component - "Only" button behavior in limited MultiSelection filter
  • [CDF-671] - Incorrect behavior on Search bar of a Filter Component
  • [CDF-676] - Duplicated text and some grammatical mistakes in FilterComponent samples

Improvement

  • [CDF-572] - As a dashboard developer, in an embedding context, I want to have a way to normalize an html id inside each dashboard.
  • [CDF-583] - Refactor CDF's FilterComponent 'templates.js' to break it down into a set of moustache template files
  • [CDF-639] - Would like to be possible to use the same addin, on the template component, with different options on the same template, passing an id when setting when calling the addin.
  • [CDF-659] - Revamp to the template addin: exposing a function for model parsing and also allow to customize messages as part of the options.

New Feature

  • [CDF-584] - Create CDF Samples for FilterComponent
  • [CDF-637] - As a web developer I want to be able to have more addins on the template component 

Release Notes - Community Graphics Generator - Version 6.0-15.10.26

Improvement

  • Upgraded to last CCC release 

Release Notes - Community Dashboard Editor - Version 6.0-15.10.26

Bug

  • [CDE-142] - CDE crashes if it does not find the .html template defined in a .wcdf file
  • [CDE-561] - Required dashboards fail to work with components that depend on DOM objects other than htmlObject
  • [CDE-576] - IE10/11: The icon Drag and Drop in layout structure is align top
  • [CDE-591] - Filter Component refers an undefined property - inputParameter
  • [CDE-600] - The documentation of the filter component does not explicitly explain that the selected items are written to the parameter as an array of IDs
  • [CDE-622] - Adding lots of parameters to a datasource results in bad formatting
  • [CDE-624] - MQL query example misleads the dashboard developer.
  • [CDE-630] - Using tab key when popup is opened allows user to edit fields beneath the popup
  • [CDE-634] - using "save as" on a requireJS supported dashboard saves the new dashboard (wcdf settings) as a legacy dashboard
  • [CDE-646] - Can't open type selector in an datasource parameters popup
  • [CDE-650] - validation popup doesn't close
  • [CDE-651] - Post Execution is running before map renderisation is concluded
  • [CDF-629] - Map component does not pan when we try to drag over a map shape
  • [CDF-676] - Duplicated text and some grammatical mistakes in FilterComponent samples

Improvement

  • [CDE-643] - Add def, pv and pvc to the default AMD dependency list when rendering CDE dashboards
  • [CDE-648] - Setting shape opacity and stroke in NewMapComponent
  • [CDE-649] - As an admin I would like to define users/roles with a permission to create/view CDE reports
  • [CDF-637] - As a web developer I want to be able to have more addins on the template component 

    Release Notes - Community File Repository - Version 15.10.26

    Bug

  • [CFR-3468] - Cannot call remove endpoint on browser window (defined as a POST but still expecting QueryParams instead of FormParam on implementation)

Feature

  • [CFR-3455] - Fully integrate "write" permission with get/set/deletePermissions endpoints 

Release Notes - Community Data Validation - Version 15.10.26

Improvement

  • Maintenance release - keep compatibility with latest ctools version  

Release Notes - Community Distributed Cache - Version 15.10.26

Improvement

  • Maintenance release - keep compatibility with latest ctools version  
CDC NOTES:

Due to a change on the plugin architecture in Pentaho 6, Hazelcast shutdown is no longer performed when the server
stops. This leads to the server not shutting down due to hanging Hazelcast threads.

This will be fixed for Pentaho 6.1, but for now the workaround is to:

  • Edit the file pentaho-solutions/system/applicationContext-spring-security.xml
  • Add the following bean: <bean class="pt.webdetails.cdc.listeners.CdcShutdownListener" />


Saiku Reporting on Kickstarter - One week more to contribute

$
0
0



Sometimes we hear opensource and imagine some magical realm where things just happen out of the blue. Guess what - it doesn't work like that. It really requires time, dedication and consequently money. Some projects are lucky enough to get sponsored by some organizations, others have to find a way to survive on their own.

A good example of the latter is the Saiku project. Undoubtably one of the most widely recognized projects on the Pentaho ecosystem, the Saiku team just recently created a Kickstarter project to fund the porting of one of the highly desired projects to the lastest reincarnations of Pentaho - the Saiku Reporting project, a frontend to the Pentaho metadata layer

Now it's time to make it happen. There's still a week to go and, as I write this, 9K£ left for a successful project.

I've done my pledge, as I strongly believe that Saiku plays a fundamental role in the Pentaho ecosystem. I'm sure you'll agree with me and try to get your company to help as well.

Here's the kickstarter project page

-pedro

Pentaho 6.0.1 Released - Support for El Capitan and Apache Commons Vulnerability fix

$
0
0

It's not normal that we make a dot release available, it's something we do for our Pentaho EE customers only.

But from time to time there's the need to break that rule. And today is one of those days. We just released Pentaho 6.0.1, with the following highlights:

Available for EE as a patch release and CE in the usual places


-pedro

"How do I print CTools dashboards in Pentaho?"

$
0
0

How do I print CTools dashboards in Pentaho?

This is a question that often keeps coming back. Even for my own reference I decided to just compile the answer that I always give in a blog post links to the relevant resources

This is a simple question with a not so simple answer. Screen and paper obey such different rules that really depends on what the customer is looking for. There are 3 common approaches:

1. The browser button

Just use the browser print button (or a button in the dashboard that uses window.print() as the code). This has the obvious advantages (dead easy) and the obvious advantages as well (the actual output depends on the browser)

2. PhantomJS on server side


Use phantomjs on the server side. Not drastically different than the previous but possible to control the output. I know of a customer that even implemented scheduling of PDFs this way. Harder to setup. Some links:

My favorite approach - PRD


The truth is that screen and paper is not the same. There's no 1:1 mapping, even with CSS media queries the result will be, at best... meh. 

My personal favorite is actually leveraging the most appropriate tool on our stack to print: PRD. My recommendation is that we build a report using the resources of the dashboard (queries and charts) and build the pixel perfect representation of the dashboard as a report. References:

Have fun!


-pedro

Survey - Wisdom of Crowds® Business Intelligence Market Study - 2016 Edition

$
0
0

I'm getting lazy on blog posts! Gotta change that! Taking a few days vacation but I promise I'll get back at it!

In the meantime, a request to, once again, participate on one of the most influential market studies for BI analytics - Howard Dresner's Widsom of Crowds, 2016 edition.

Here are the details:



Welcome to the Wisdom of Crowds® Business Intelligence Market Study - 2016 Edition.

Qualified respondents that complete the survey will receive complimentary research for their personal use. This includes end users, IT and independent consultants.

Each section corresponds to an upcoming 2016 research report. The more survey sections that you complete, the more free research you will receive throughout the year!

Please note that this survey should take approximately 25-40 minutes to complete.

The objective of this study is to collect data on trends, vendors and products in Business Intelligence, Enterprise Planning and related markets. As a result we will be able to examine the realities, plans and perceptions surrounding these important markets. We will also rank vendors and products - creating an important tool for those seeking to invest in these solutions.

The underlying principle is this: the more data we collect, the more accurate the results.

This study is NOT sponsored by vendors (or anyone else) and none of your detailed data will be shared with the outside world. So, we respectfully request that you provide us with complete and accurate information - including your name, company, title and business email address.

For consultants: Please respond on behalf of a client or clients that you are working with.

Anonymous survey entrees cannot be accepted.

Thank you for participating. I am confident that this will provide an important and fresh perspective into the marketplace for all!

Sincerely,

Howard Dresner
Chief Research Officer
Dresner Advisory Services, LLC

Once again, here's the link to the Howard Dresner's Widsom of Crowds Survey, 2016 edition.

New Ctools Releases - 16.01.22: Commencing Countdown, Engines on

$
0
0

Commencing countdown, engines on

Can you imagine something more absolutely and completely boring than a product changelog? An endless list of jira numbers, bug fixes, features and improvements whose sight alone makes the brain cringe?

Well, think again. This is Pentaho.
Even when the task at hand seems completely disheartening, with some sleight of hand, irreverence and imagination it's possible to bring color into the otherwise black and white scene of technology.


This was exactly what the Pentaho UX team did with the latest Ctools release changelog. If you take a look, you'll find this:


Now, this is undoubtedly a really cool site. But as we all know, the devil is in the details. And there's a tiny, tiny detail here that, at least for me, turned an informative navigation into an emotional experience.
While browsing the site, make sure to pay attention to the fact that there's a star waiting in the sky. And I'm willing to bet that at least some of you won't be able to hold back a genuine smile when clicking there. 


So simple. So easy. So impactful. Hats off to them.
Now, what if the rest of us could bring some of this type of magic to what we do on a daily basis? Can you just imagine what we could all achieve regardless of the task at hand? Yes?

So let's do it.

Back to basics - Compiling the Ctools projects

$
0
0
We've been making some changes to the Ctools projects, and gradually migrating them to maven. It's a WIP, but you can see that, eg, CDF and CDA already have that structure.

Unlike ant, where things just work when you type that command line, maven requires some extra settings. Our dev team very recently documented all that's needed to do, so all is now on the README files

Here's a screenshot from the CDF README.me:


[Pentaho News] - March 2016

$
0
0

We recently started doing a customer newsletter; And since we consider our community also as a customer, it's my task to share it with you all. 

Don't let yourself be fooled by the fact that it starts with a big picture of Casters here - I know it scares all living beings, but read through it, there's extremely valuable info! ;)


-pedro



Pentaho
MattCasters-header-v2.jpg
Pentaho News: March 2016
Hi Pedro,
Welcome to the March edition of our quarterly customer newsletter! Our goal is to share valuable resources that help you get the most out of Pentaho. Click below to learn about our new certification program, local Pentaho events near you, best practices from product experts, and much more!
 
attend-events-v2.jpg
Attend Events

To see our
full event and webinar schedule, click here
 
share-your-thoughts-v2.jpg
Share Your Thoughts
5 Minutes for a Chance to Win $200
Take our 5 minute survey to share your thoughts on the media sources you find most valuable, and be entered in our drawing.

Industry survey about BI: Where's the BI and analytics market headed?  Find out! Participate in Wisdom of Crowds®.
 
learn-best-practices.jpg
 
get-certified-v2.jpg
Get Certified
Together with Hitachi Data Systems, we’ve rolled out the Pentaho Business Analytics Implementation exam, which has replaced the Pentaho Solutions Consultant exam.
Click the registration link below, pick a test center and then select “HH0-590 HDS Certified Specialist - Pentaho Business Analytics Implementation” to get certified.
Learn more or register today!

[Marketplace Spotlight] PDI Xero plugin

$
0
0
Starting a new series today - Highlighting some of the contributions made by our community. They are so many that's it's hard to track, but I will make a serious effort. After all... it's my job... :p



Marketplace Spotlight - PDI Xero Plugin

Product: PDI
Plugin: PDI Xero Plugin
Author: Bulletin.Net (NZ) Limited
Maturity Classification: Level 3, Community Lane (Unsupported) 

Plugin info

This plugin allows to extract data from the Xero Accounting Software. I admit I didn't know about it (me and finances don't quite go well together), but looks really interesting.

They allow a trial period, so I went for a spin. Once I logged in to the application I went to the demo company:

Demo company dashboard

What the plugin does is allow to extract and then further process the data from Xero; After installing it from the marketplace this is what I see:

Xero GET step
So the first thing I need is to get a customer key. A quick google search takes me to http://api.xero.com, where I can register something they call a private application

Registering a private application

Ah - a X509 Public Key Certificate. Always a great excuse to resort to my rusty old openssl skills. So I generated a self signed certificate:

$ openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes

The plugin doesn't seem to allow a passphase, so I didn't use one. After generating the certificate and uploading it to the site, I was able to get the info I needed

My test application
So now I have everything I need to test this out. I decided to extract the contacts data:

Demo company contacts

Turns out this was dead easy!

The exported results - just as expected

Conclusion


Another great contribution focused on a highly useful usecase. I'd like to thank the guys from Bulletin.Net for sharing this with the community!


-pedro

Pentaho 6.1 (EE and CE) is available

$
0
0
Amazing. Every single release is the same. I know months in advance what the release date is. And every-single-f'in-time I'm always able to somehow mess up the dates and end up with one of those "what do you mean it's tomorrow??"situations....

Oh well. It's done. And I need to sleep.

Anyway. This is, by far, the best release so far! This blog post only touches the overview, but there are absolutely amazing details that I'll focus on later

Pentaho 6.1 (EE and CE) is available

You can get it from the usual places:

Here are some of the main changes:

Data Services improvements

On to my favorite topic - Data services. As I mentioned before, with data services we can expose any point of our transformation as a virtual / umaterialized table that can be accessed from the outside through a jdbc interface.

This will be key to us going forward, and we're improving on this topic

Data services with auto-modeling & Analyzer (EE)

Build model - Data services edition

You can use a Pentaho Data Service as the source in your Build Model job entry, which streamlines the ability to generate data models when you are working with virtual tables. Logical Data Warehouse - I salute you! Insanely powerful stuff.

Parameter push down optimization (CE/EE)

Parameter push down optimization
Data services supported 2 types of optimizations as of 6.0:

  • Cache optimization
  • Query pushdown, that allowed passing parameters down to a table or mongodb input
The query push down optimization, while insanely useful, has an obvious drawback: what if we're not using a sql or mongo query?

The parameter push down optimization aims to complement that drawback; When you do a data service query of the type "where country='Portugal'" you can say that you want the country value to be mapped to a COUNTRY_QUERY parameter; While this doesn't work for all the queries (it's limited to the equal operation, doesn't work for ranges or in lists), it can be used in tons of different use cases. The screenshot above applies this optimization to a rest call, a situation where the format option proves being particularly useful. 

You can see the (amazing!) documentation on data services optimizations here

Simplified JDBC driver download (EE/CE)

Simplified driver download
You're going to hear us talking a lot about usability going forward, and this follows that theme. It's now very easy to get not only the details for how to use the data services jdbc driver, but also to get the driver as well. 

Download the driver, configure the client, drop the jars and you're done!

Execute transformation/job dialog improvements (EE/CE)

Run options

Still on spoon, still on usability; We refreshed the execute transformation / job run dialog. You'll notice the difference

JSON step improvements (EE/CE)

Did you ever need to parse json in PDI? Most like yes. And most likely you felt the performance was, hum... sub-optimal, to say the least. 

Well, not anymore. And the credits to this go entirely to our Community friends at Graphiq. They submitted a Fast JSON Input step to the Pentaho Marketplace and we worked with them to incorporate that into the main product.

The result? See for yourself:

JSON step results
Yep - you're looking at - at least - a 10x + improvement. This is huge. Etienne, Jesse and Nicholas, we we have the chance to meet in person the beer is on me! :)

Analyzer Inline modeling improvements (EE)

We're improving the ability for the end user (that has the appropriate permissions) to do global changes on the available models for the business users. Two new features are available in 6.1

Edit Calculated Measures in Analyzer (EE)

Applying calculated measure to the model

You can now update the properties on a calculated measure created via inline model editing within Analyzer, such as if you want to rename the measure or adjust the MDX formula on the calculation. Also, you can now easily identify calculated measures in the Available fields list by the icon, 'f(x)', which only displays next to calculated measures created in Analyzer via inline model editing.

Show and Hide Available Fields (EE)

Show / hide fields
You can select to hide or show fields in the list of Available fields for a report in Analyzer. Hiding fields is helpful when you want a clear view of only those fields you are interested in for your report. When you hide a field, it is no longer available for selection in the report.

Official CTools documentation

Ah! I promised this in my Pentaho World presentation last year and we delivered it (me keeping up to my promises doesn't actually happen often :p).

We already supported CTools; But this is a very important checkpoint; It's now part of the official documentation!
CTools documentation in help.pentaho.com


You can now learn the basics of the Community Dashboard Editor (CDE) with the CDE Quick Start Guide and its companion articles:

Metadata Injection (CE)

Oh, you're going to hear a lot about this. Power users know it already. 

I won't dive into this in many detail because: 1) I'm hungry and it's lunch time and 2) Jens did an amazing job doing that in his blog post

Metadata Injection (MDI) gives you the ability to modify transformations at execution time. It refers to the dynamic passing of metadata to PDI transformations at run time in order to control complex data integration logic. The metadata (from the data source, a user defined file, or an end user request) can be injected on the fly into a transformation template, providing the “instructions” to generate actual
Metadata Injection principles
What did we do in this release (and will continue doing going forward)? We added to the list of supported steps. This is how things look currently:

List of MDI enabled steps
But like I said - Just go read Jens' blog. It's his baby

Enable community testing of new platform infrastructure  (CE)

I saved the best of last. If MDI is Jens' baby... this is my baby. And my baby is prettier than his! :p

This is something that I'll have to dedicate a blog post later. I'll just ask you to do the following, if you're a CE user (EE users will also be able to enable this):

  1. Run the BA server
  2. Open PDI
  3. Create a new repository connection - but point it to your BA server instead of a DI server...
  4. Connect to it
  5. Take a look at the repository explorer. Give yourself a moment to understand what you're looking at
  6. Yep - you're starting to understand.... Hell yeah!  ;)

Have fun!


-pedro

Pentaho Excellence Awards

$
0
0

Hey everyone!



I’m excited to announce that we are now accepting nominations for the third annual Pentaho Excellence Awards! It is through this awards program that we have been able to capture and publicly share customer stories for Caterpillar, CERN, FINRA, Landmark-Halliburton, Ruckus Wireless, NASDAQ and more.

The Pentaho Excellence Awards recognizes the innovative and impactful ways that our customers are using Pentaho. This year in August, we will award one winner in the following five categories:

  • Big Data
  • Embedded Analytics
  • Enterprise Deployment
  • ROI
  • Social Impact

The company with the highest overall score will be awarded the Customer of the Year Award.

The nomination deadline is May 31st.

Learn more at: http://www.pentaho.com/excellence-awards


Cheers


-pedro

CBF2: The ultimate collaboration (and deploy?) guide to Pentaho, Docker and Git

$
0
0
Let's get geeky - this one is huuuuuge! :)

The first contribution


Nearly 9 (?!?! bloody hell, that much??) years ago I did my first decent post on the Pentaho Community forums (first of about 2 or 3 decent posts overall :p )




It was a tutorial on how to setup Pentaho to work in a multiple project environment; From that thread, I think it was Pentaho 1.2, Hieroglyph Edition... I didn't even knew how to build a solution, but since couldn't get my head around how to setup infrastructure, I started with that. This was mainly an ant script that launched the Pentaho platform build targets, with some extra options on it. 

This "Very big pentaho install and setup tutorial" (see how I've always had a way with naming stuff? ;) ) later evolved to what we know today as CBF - the Community Build Framework



This is an insanely useful project, that we used and maintained actively throughout the last decade. It still works, but now we have different paradigms to take into account.

Similar requirements / new approach

These were my premises 9 years ago:

  • I needed to know how to build a demo solution that connects to an arbitrary database and shows simple pentaho abilities in the least possible time (eg: have a demo set in 2 hours in a client's database)
  • I need to switch configuration on the fly so that I can switch from different scenarios (eg: from client A to client B)
  • I don't want to change any original files that could get overwritten in a upgrade
  • The platform upgrade must be easy to do and not break the rest of the setup
  • Debug is a "must have"
  • Must support all kinds of different customization in different projects, from using a different databases (I'm using postgresql , thanks elassad) to different security types.
For those, I chose the CBF structure that compiled and patched the main source, compiled it, started it up...

Now I added a few more requirements
  • It shouldn't compile anymore; Should work form binaries
  • Should work for CE and EE
  • For EE, should process patches
  • Should work for nightly builds
  • Should setup not only pentaho but all the rest of the environment (just... click and go!)
  • Solutions should be VCS-able in git / svn / whatever
  • Anyone should be able to set things up in the blink of an eye
  • Should greatly increase team collaboration by allowing import / exports of work done
  • and why not... even be deployable?
So I built CBF2. It's called CBF2 because it shares a lot of the same premises of CBF... even though it's not for community only and doesn't build anything :p  - I told you I suck at naming stuff.

CBF was a fancy name for a ant build file; CBF2 is a fancy name for a really cool set of bash scripts.

Let me be clear of how huge this is: If you were kind'a complaining about the (in)ability to properly manage pentaho project lifecycles, installs, backups and restores, configurations.... you can stop complaining. This is The Solution (tm) for it. 

So, here it is: CBF2, where Pentaho, Docker and Git meet for the ultimate solution lifecycle management


CBF2 - Community Build Framework 2.0

It's not community only; You don't actually build anything; But still rocks!

Purpose

The goal of this project is to quickly spin a working Pentaho server on docker containers. This will also provide script utilities to get the client tools.

Requirements

  • A system with docker. I'm on a mac, so I have docker-machine
  • A decent shell; either Minux or Mac should work out of the box, Cygwin should as well
  • lftp
For docker, please follow the instructions for your specific operating system. I use a Mac with Homebrew, so I use docker-machine (4Gb mem, 40Gb disk, YMMV)
brew install docker
brew install docker-machine
docker-machine create -d virtualbox --virtualbox-memory 4096 --virtualbox-disk-size 40000 dev

How to use

There are a few utilities here:
  • getBinariesFromBox.sh - Connects to box and builds the main images for the servers (requires access to box. Later I'll do something that doesn't require that)
  • cbf2.sh - What you need to use to build the images
  • getClients.sh - A utility to get the clients tools
  • startClient.sh - A utility to start the client tools

The software directory

This is the main starting point. If you're a pentaho employee you will have access to using the getBinariesFromBox.sh script, but all the rest of the world can still use this by manually putting the files here.
You should put the official software files under the software/v.v.v directory. It's very important that you follow this 3 number representation
This works for both CE and EE. This actually works better for EE, since you can also put the patches there and they will be processed.
For EE, you should use the official -dist.zip artifacts. For CE, use the normal .zip file.

The licenses directory

For EE, just place the *.lic license files on the licenses subdirectory. They will be installed on the images for EE builds.

Released versions:

X.X.X, and inside drop the server, plugins and patches

Nightly Builds

Drop the build artifacts directly in that directory
Example:
software/
├── 5.2.1
│   ├── SP201502-5.2.zip
│   ├── biserver-ee-5.2.1.0-148-dist.zip
│   ├── paz-plugin-ee-5.2.1.0-148-dist.zip
│   ├── pdd-plugin-ee-5.2.1.0-148-dist.zip
│   └── pir-plugin-ee-5.2.1.0-148-dist.zip
├── 5.4.0
│   └── biserver-ce-5.4.0.0-128.zip
├── 5.4.1
│   ├── SP201603-5.4.zip
│   └── biserver-ee-5.4.1.0-169-dist.zip
├── 6.0.1
│   ├── SP201601-6.0.zip
│   ├── SP201602-6.0.zip
│   ├── SP201603-6.0.zip
│   ├── biserver-ce-6.0.1.0-386.zip
│   ├── biserver-ee-6.0.1.0-386-dist.zip
│   ├── paz-plugin-ee-6.0.1.0-386-dist.zip
│   ├── pdd-plugin-ee-6.0.1.0-386-dist.zip
│   └── pir-plugin-ee-6.0.1.0-386-dist.zip
├── 6.1-QAT-153
│   ├── biserver-ee-6.1-qat-153-dist.zip
│   ├── biserver-merged-ce-6.1-qat-153.zip
│   ├── paz-plugin-ee-6.1-qat-153-dist.zip
│   ├── pdd-plugin-ee-6.1-qat-153-dist.zip
│   └── pir-plugin-ee-6.1-qat-153-dist.zip
├── 7.0-QAT-76
│   ├── biserver-merged-ee-7.0-QAT-76-dist.zip
│   ├── pdd-plugin-ee-7.0-QAT-76-dist.zip
│   └── pir-plugin-ee-7.0-QAT-76-dist.zip
└── README.txt

CBF2: The main thing

CBF1 was an ant script but CBF2 is a bash script. So yeah, you want cbf2.sh. If you are on windows... well, not sure I actually care, but you should be able to just use cygwin.
Here's what you'll see when you run ./cbf2.sh:
--------------------------------------------------------------
--------------------------------------------------------------
------ CBF2 - Community Build Framework 2 -------
------ Version: 0.9 -------
------ Author: Pedro Alves (pedro.alves@webdetails.pt) -------
--------------------------------------------------------------
--------------------------------------------------------------

Core Images available:
----------------------

[0] baserver-ee-5.4.1.0-169
[1] baserver-ee-6.0.1.0-386
[2] baserver-merged-ce-6.1-qat-153
[3] baserver-merged-ee-6.1.0.0-192

Core containers available:
--------------------------

[4] (Stopped): baserver-ee-5.4.1.0-169-debug

Project images available:
-------------------------

[5] pdu-project-nasa-samples-baserver-ee-5.4.1.0-169
[6] pdu-project-nasa-samples-baserver-merged-ee-6.1.0.0-192

Project containers available:
-----------------------------

[7] (Running): pdu-project-nasa-samples-baserver-ee-5.4.1.0-169-debug
[8] (Stopped): pdu-project-nasa-samples-baserver-merged-ee-6.1.0.0-192-debug

> Select an entry number, [A] to add new image or [C] to create new project:
There are 4 main concepts here:
  • Core images
  • Core containers
  • Project images
  • Project containers
These should be straightforward to understand if you're familiar with docker, but in a nutshell there are two fundamental concepts: images and containers. An image is an inert, immutable file; The container is an instance of an image, and it's a container that will run and allow us to access the Pentaho platform

Accessing the platform

When we run the container, it exposes a few ports, most importantly 8080. So in order to see Pentaho running all we need to do is to access the machine where docker is running. This part may vary depending on the operating system; On a Mac, and using docker-machine, there's a separate VM running the things, so I'm able to access the platform by using the following URL:
http://192.168.99.100:8080/pentaho/Home

Core images

These are the core images - a clean install out of one of the available artifacts that are provided on the software directory. So the first thing we should do is add a core image. The option [A] allows us to select which image to add from an official distribution archive.
When we select this option, we are prompted to choose the version we want to build:
> Select an entry number, [A] to add new image or [C] to create new project: A

Servers found on the software dir:
[0]: biserver-ee-5.2.1.0-148-dist.zip
[1]: biserver-ce-5.4.0.0-128.zip
[2]: biserver-ee-5.4.1.0-169-dist.zip
[3]: biserver-ce-6.0.1.0-386.zip
[4]: biserver-ee-6.0.1.0-386-dist.zip
[5]: biserver-ee-6.1-qat-153-dist.zip
[6]: biserver-merged-ce-6.1-qat-153.zip
[7]: biserver-merged-ee-7.0-QAT-76-dist.zip
CBF2 will correctly know how to handle EE dist files, you'll be presented with the EULA, patches will be automatically processed and licenses will be installed.
Once an image is built, if we select that core image number you'll have the option to launch a new container or delete the image:
> Select an entry number, [A] to add new image or [C] to create new project: 0
You selected the image baserver-ee-6.0.1.0-386
> What do you want to do? (L)aunch a new container or (D)elete the image? [L]:

Core containers

You can launch a container from a core image. This will allow us to explore a completely clean version of the image you selected. This is useful for some tests, but I'd say the big value would come out of the project images. Here are the options available over containers:
> Select an entry number, [A] to add new image or [C] to create new project: 3

You selected the container baserver-merged-ce-6.1-qat-153-debug
The container is running; Possible operations:

S: Stop it
R: Restart it
A: Attach to it
L: See the Logs

What do you want to do? [A]:
Briefly, here are the options mean - even though they should be relatively straightforward:
  • Stop it: Stops the container. When the container is stopped you'll be able to delete the container or start it again
  • Restart it: Guess what? It restarts it. Surprising, hein? :)
  • Attach to it: Attaches to the docker container. You'll then have a bash shell and you'll be able to play with the server
  • See the Logs: Gets the logs from the server

Projects

Definition and structure

A project is built on top of a core image. Instead of being a clean install it's meant to replicate a real project's environment. As a best practice, it should also have a well defined structure that can be stored on a VCS repository.
Projects should be cloned / checked out in to the projects directory. I recommend every project to be versioned in a different git or svn repository. Here's the structure that I have:
pedro@orion:~/tex/pentaho/cbf2 (master *) $ tree  -l ./projects/
./projects/
└── project-nasa-samples -> ../../project-nasa-samples/
├── _dockerfiles
└── solution
└── public
├── Mars_Photo_Project
│   ├── Mars_Photo_Project.cda
│   ├── Mars_Photo_Project.cdfde
│   ├── Mars_Photo_Project.wcdf
│   ├── css
│   │   └── styles.css
│   ├── img
│   │   └── nasaicon.png
│   └── js
│   └── functions.js
└── ktr
├── NASA\ API\ KEY.txt
├── curiosity.ktr
├── getPages.ktr
└── mars.ktr
All the solution files are going to be automatically imported, including metadata for datasources creation.
The directory _dockerfiles is a special one; You can override the default Dockerfile that's used to build a project image (the file in dockerfiles/buildProject/Dockerfile) and just drop a project specific Dockerfile in that directory using the former one as an example. Note that you should not change the FROM line, as it will be dynamically replaced. This is what you want for project level configurations, like installing / restoring a specific database, an apache server on front or any fine tuned configurations.

Project images

The first thing that we need to do is to create a project. To do that is very simple: we select one of the projects on our projects directory and a core image to install it against. This separations aims at really simplifying upgrades / tests / etc
> Select an entry number, [A] to add new image or [C] to create new project: C

Choose a project to build an image for:

[0] project-nasa-samples

> Choose project: 0

Select the image to use for the project

[0] baserver-ee-6.0.1.0-386
[1] baserver-merged-ce-6.1-qat-153
[2] baserver-merged-ee-6.1.0.0-192

> Choose image: 2
Once we have the project image created, we have access to the same options we had for the core images, which is basically launching a container or deleting the image.

Project containers

Like the images, project containers work very similarly to core containers. But we'll also have two extra options available:
  • Export the solution: Exports the solution to our project folder
  • Import the solution: Imports the solution from our project folder into the running containers. This would be equivalent to rebuilding the image
Note that by design CBF2 only exports the folders in public that are already part of the project. You'll need to manually create the directory if you add a top level one.

The client tools

This also provides two utilities to handle the client tools; One of them, the getClients.sh, is probably something you can't use since it's for internal pentaho people only.
The other one, startClients.sh, may be more useful; It requires the client tools to be downloaded into a dir called clients/ with a certain structure:
pedro@orion:~/tex/pentaho/cbf2 (master *) $ tree -L 4 clients/
clients/
├── pad-ce
│   └── 6.1.0.0
├── pdi-ce
│   ├── 6.1-QAT
│   │   └── 156
│   │   └── data-integration
│   ├── 6.1.0.0
│   │   └── 192
│   │   └── data-integration
│   └── 7.0-QAT
│   └── 57
│   └── data-integration
├── pdi-ee-client
│   └── 6.1.0.0
│   └── 192
│   ├── data-integration
│   ├── jdbc-distribution
│   └── license-installer
├── pme-ce
│   └── 6.1.0.0
│   └── 182
│   └── metadata-editor
├── prd-ce
│   └── 6.1.0.0
│   └── 182
│   └── report-designer
└── psw-ce
└── 6.1.0.0
If you use this, then the startClients.sh simplifies launching them; Note that, unlike the platform, this will run on the local machine, not on a docker VM:
edro@orion:~/tex/pentaho/cbf2 (master *) $ ./startClients.sh
Clients found:
--------------

[0] pdi-ce: 6.1-QAT-156
[1] pdi-ce: 6.1.0.0-192
[2] pdi-ce: 7.0-QAT-57
[3] pdi-ee-client: 6.1.0.0-192
[4] pme-ce: 6.1.0.0-182
[5] prd-ce: 6.1.0.0-182

Select a client:

Taking it further

This is, first and foremost, a developer's tool and methodology. I'll make no considerations or recommendations in regards to using these containers in a production environment or not because I have simply no idea how that works as we're mostly agnostic on those methods.
Pentaho's stance is clearly explained here:
As deployments increase in complexity and our clients rapidly add new software
components and expand software footprints, we have seen a definitive shift
away from traditional installation methods to more automated/scriptable
deployment approaches. At Pentaho, our goal is to ensure our clients continue
to enjoy flexibility to adapt our technology to their environments and
individual standards.

Throughout 2015, Pentaho worked with customers who use various deployment
technologies in development, test, and production environments. We have seen
that the range of technologies used for scripted software deployment can vary
as widely as the internal IT standards of our clients. In short, we have not
found critical mass in any single deployment pattern.

To support our clients in their adoption of these technologies, Pentaho takes
the perspective that our clients should continue to be autonomous in their
selection and implementation of automated deployment and configuration
management.

Pentaho will provide documented best practices, based on our experience and
knowledge of our product, to assist our clients in understanding the
scriptable and configurable options within our product, along with our
deployment best practices. Due to the diversity of technology options, Pentaho
customer support will remain focused on the behavior of the Pentaho software
and will provide expertise on the Pentaho products to help customers
troubleshoot individual scripts or containers.


Have fun. Tips and suggestions to pedro.alves at webdetails.pt

[Marketplace Spotlight] Stream Schema Merge

$
0
0

Stream Schema Merge

The contribution


Our friends at Graphiq did it again; Not only they recently submitted a great contirbution for the Json Input plugin that we incorporated in 6.1, recently they added a new contribution, the Stream Schema Merge plugin.

Andrew did an amazing blog post describing it; I can't possibly do better than what they did, so I'm really just highlighting their effort. But I'd like to just briefly describe what it does

How it works

I admit at first I was a bit confused; Why the hell do I need a step to do something that PDI does natively?

Then I saw this:


And this:

Then it all made sense! It's bloody genius! The trick here is that with this step there's no longer the requirement of converting all streams to the same structure! Not only this simplifies the transformation a lot, it really improves performance: Every time we change the structure of a stream there's a cost of building a new Object[][]

And with this step we don't need that.

Amazing job guys! Honestly, after seeing this, the question I ask myself is why doesn't PDI just behave like this by default?? And on that, we may talk again, Andrew and team :p




Adding Metadata Injection Support to a Pentaho Data Integration Step

$
0
0
As I wrote on a previous blog post - and not nearly as well as Jens did on his blog - Metadata Injection is kind'a of a big deal around here. You may have used it, you may have heard about it, but I'm sure you will in the future. 

Simply put - the concept of Metadata Injection is what allows a transformation to change itself at run time, dynamically adapting as needed to different inputs, different rules, different outputs.

In order to do that, the individual steps have to support it; We've been doing a huge amount of catch up work to increase the list of steps that support it, but that's not enough - we need your help! I'd like each of you to also add MDI support to the steps you've been contributing to the marketplace. 

In order to facilitate it, the engineering team prepared the following instructions on how to do it (and here's a link to a concrete implementation)

Adding Metadata Injection Support to Your Step


You can add metadata injection support to your step by marking the metadata class and the step’s fields with injection-specific annotations. You use the @InjectionSupported annotation to specify that your step is able to support metadata injection. Then, you use either the @Injection annotation to specify which fields in your step can be injected as metadata, or use the @InjectionDeep annotation for fields more complex than usual primitive types (such as string, int, float, etc.).

InjectionSupported

Use the @InjectionSupported annotation in the metadata class of your step to indicate that it supports metadata injection. This annotation has the following parameters.

Parameter
Description
localizationPrefix
Indicates the location for your messages in the /messages/messages_.properties file. When the metadata injection properties are displayed in PDI, the description for the field is retrieved from the localization file by the mask .
groups
Indicates the optional name of the groups you use to arrange your fields. Your fields will be arranged in these groups when they appear in the ETL Metadata Injection step properties dialog.

For example, setting the localizationPrefixparameter to “Injection.” for the “FILENAME” field indicates the /messages/messages_Injection.FILENAME.propertiesfile. This prefix and "FILENAME" field within the following @InjectionSupported annotation tells the system to use the key "Injection.FILENAME" to retrieve descriptions along with the optional “GROUP1” and “GROUP2” groups.
@InjectionSupported(localizationPrefix="Injection.", groups = {"GROUP1","GROUP2"})
If your step already has metadata injection support using a pre-6.0 method, such as it returns an object from the getStepMetaInjectionInterface()method, then you will need to remove the injection class and getStepMetaInjectionInterface()method from the metadata class. After this class and method are removed, the method getStepMetaInjectionInterface()is called from the base class (BaseStepMeta), and returns null. The null value indicates your step does not support pre-6.0 style metadata injection. Otherwise, if your step did not use this type of implementation, you do not need to add or manually modify this method to the metadata class.
Although inheritance applies to injectable fields specified by the @Injection and @InjectionDeep annotations, you still need to apply the @InjectionSupported annotation to any step inheriting the injectable fields from another step. For example, if an existing input step has already specified injectable fields through the @Injection annotation, you do not need to use the @Injection annotation for fields you inherited within the step you create. However, you still need to use the @InjectionSupported annotation in the metadata class of your step even though that annotation is also already applied in the existing input step.

Injection

Each field (or setter) you want to be injected into your step should be marked by the @Injection annotation. The parameters of this annotation are the name of the injectable field and the group containing this field:
@Injection(name = "FILENAME", group = "FILE_GROUP") - on the field or setter
This annotation has the following parameters.
Parameter
Description
name
Indicates the name of the field. If the annotation is declared on the setter (typical style setter with no return type and accepts a single parameter), this parameter type is used for data conversion, as if it was declared on a field.
group
Indicates the groups containing the field. If group is not specified, root is used when the field displayed in the ETL Metadata Injection step properties dialog.
The data type declared for the field will be used for conversion from the dataset to injection. See the RowMetaAndData.getAsJavaType() method for possible type combinations.
Currently supported datatypes for fields: String, boolean, int, long, and enum. Autoboxing for primitive types is also supported, so Long and Boolean may be used as well. Supported datatypes for dataset: TYPE_STRING, TYPE_BOOLEAN, TYPE_INTEGER.

This annotation can be used either:
·      On a field with simple type (string, int, float, etc.)
·      On the setter of a simple type
·      In an array of simple types
·      On a java.util.Listof simple types. For this List usage, type should be declared as generic.
Besides with these types, you need to understand special exceptions for enums, arrays, and data type conversions.

Enums

You can mark any enum field with the @Injection annotation. For enum fields, metadata injection converts source TYPE_STRING values into enum values of the same name. For your user to be able to use any specified values, all possible values should be described in the documentation of your metadata injection step.

Arrays

Any @Injection annotation can be added to an array field:
@Injection(name="FILES")
privateString[] files;
The metadata object can also have a more complex structure:
MyStepMeta.java
public class MyStepMeta {
  @InjectionDeep
  private OneFile[] files;

  public class OneFile {
    @Injection(name="NAME", group="FILES")
    public String name;
    @Injection(name="SIZE", group="FILES")
    public int size;
  }
}
Metadata injection creates objects for each row from the injection information stream. The number of objects equals the number of rows in the information stream. If different injections (like NAME and SIZE in the example above) are loaded from different information streams, you have to make sure that the row numbers are equal on both streams.
Note: Instead of an array, you could use Java.util.List with generics.

Data Type Conversions

You can convert from RowSet to simple type for a field with the DefaultInjectionTypeConverter class. Currently supported data types for fields are string, boolean, integer, long, and enum. You can also define non-standard custom converters for some fields by declaring them in the 'converter' attribute of @Injection annotation. These custom conversations are extended from the InjectionTypeConverter class.

InjectionDeep

Only the fields of the metadata class (and its ancestors) are checked for annotations, which works well for simple structures. If your metadata class contains more complex structures (beyond primitive types), you can use the @InjectionDeep annotation to inspect annotations inside these complex (not primitive) fields.
Example:
@InjectionDeep

This annotation can be used on the array or java.util.List of complex classes.

New book available: Learning Pentaho CTools

$
0
0

Call me old fashioned, but even on this era of blogs, social media, info scattered through the net, it's always amazing to see a book covering our technology. Just feels like it's more real when you can actually physically pick something up.

And, once again I may be the only one thinking this, but I see it as a great milestone in a project's lifecycle; it's not a pet project anymore - it's the real deal.

And a latest addition that is particularly close for obvious reasons: Miguel Gaspar, one of our consultants, went ahead and published a book covering Pentaho Ctools



I'm obviously really proud to see a book coming out on a project I had do opportunity to work on from the ground up (even though today the team always kick me out from the dev room and don't let me touch a single line of code...). I did a quick search and found the forum post over 8 years old that announced the first release ever of CDF, the first Ctool.

Amazing stuff Miguel. Thanks for doing this, I would never be able to write a book (hum... well, technically I actually wrote one but.... meh, ignore)

Other Pentaho literature

We've been lucky enough to have already a couple of books covering several aspects of pentaho - The platform, a couple covering PDI, and... well, more than I can link. And I see this as a huge advantage. I'm sure any prospect / customer will feel much more confident by seeing this huge amount of resources available.

Just take a look at this screenshot with the search results from https://www.packtpub.com/all/?search=pentaho. Mind blowing, and from one editor alone (there's more on Wiley and others). It's  just what you get when you can work with a great community :)


Cheers!


-pedro



A new Pentaho Library on the community site

CTools, IoT Smart Cities, and More

$
0
0
Pentaho has a customer newsletter that sometimes features an interview with a very smart person. But being summer and all, everyone was on vacations! They had absolutely no one left to interview, and the final choices were between me and Amanda, the office security lady.

Unfortunately, Amanda declined, so... hum... hello



So if you're extremely bored, click here to read a few things about about how customers are contributing to the Pentaho community, use cases of CTools, cool projects on IoT with Hitachi Smart Cities and the City of Copenhagen, what’s next in Pentaho 7.0 and more! (did you notice how I always enumerate as much as I possibly can and then end up with "and more" or "etc" when in reality I don't have anything else to say? Cool tactic, hum?)


-pedro

Pentaho Ctools Release 16.08.18

$
0
0

Release Notes - Community Dashboard Editor - Version 6.1-16.08.18

Bug

  • [CDE-778] - Error in components when resultSet is empty and the 'Column Type' attribute is defined.
  • [CDE-802] - Export Button gets wrong results when tables use input html tags other than filter's one
  • [CDE-808] - Incorrect text in Map Component Reference
  • [CDE-822] - CGG Dial Component replicates when parameters change
  • [CDE-825] - DashboardComponent: parameter propagation should take into account all the mapped parameters
  • [CDE-831] - Sample View Manager doesn't open (only Legacy)
  • [CDE-837] - FilterComponent - html injection
  • [CDE-858] - Openlayers map not cleaning selection when some features are loaded selected
  • [CDF-826] - BlockUI does not appear when a certain dashboard is embedding another one using dashboard component.

Improvement

  • [CDE-824] - DashboardComponent: Make parameter propagation happen both ways
  • [CDE-826] - DashboardComponent: Expose a way to turn on and off the parameter propagation
  • [CDE-840] - Expose a option in the dashboard editor to modify the failureCallback property of the tablecomponent

New Feature

  • [CDF-603] - CCC - Realtime - Sliding Window to cope with constantly incoming data
For 5.x:

Release Notes - Community Dashboard Editor - Version 16.08.18

Bug

  • [CDF-826] - BlockUI does not appear when a certain dashboard is embedding another one using dashboard component.
  • [CDE-858] - Openlayers map not cleaning selection when some features are loaded selected


Release Notes - Community Dashboard Framework - Version 6.1-16.08.18

Bug

  • [CDF-449] - CCC - On timeseries charts, the zeroline of the base axis shows up on 1970
  • [CDF-452] - CCC - Treemap - specifying the colorMap option throws an error
  • [CDF-809] - Samples under plugin-samples > CDF have some issues
  • [CDF-826] - BlockUI does not appear when a certain dashboard is embedding another one using dashboard component.
  • [CDF-865] - Dashboard require - Radio Button component default type "checkbox"
  • [CDF-871] - PrptComponent sample executes the components in an arbitrary order
  • [CDF-872] - Missing dependency in the CDF AMD broadcast sample
  • [CDF-875] - On the Filter Component with the sortByLabel enabled, and a page length defined: scrolling down to fetch more items creates a loop until the last selected item is found
  • [CDF-881] - TableComponent with paginateServerSide set to true, will trigger 2 queries
  • [CDF-888] - Table Component cannot updated if paginate Server side is true
  • [CDF-895] - CCC - Stacked area chart has an incorrect behaviour when one of the series has null and not null values.
  • [CDF-896] - CDF Storage: any user is able to change the storage of another user
  • [CDF-912] - CCC - Axis tick label overflows - layout fails to take axis offset into account
  • [CDF-913] - CCC - Axis tick label overflows - ignores fixed or maximum axis sizes
  • [CDF-917] - CCC - Axis tick label overflows - fails on fixed categorical bands layout
  • [CDF-918] - CCC - Metric/Scatter chart - cannot set axis offset to 0
  • [CDF-919] - CCC - Axis tick label overflows - fails when OverlappedLabelsMode is "hide"
  • [CDE-778] - Error in components when resultSet is empty and the 'Column Type' attribute is defined.
  • [CDE-837] - FilterComponent - html injection

Improvement

  • [CDF-670] - As a dashboard developer using a TableComponent, I would like to be able to provide a friendly error message when a query fails

New Feature

  • [CDF-603] - CCC - Realtime - Sliding Window to cope with constantly incoming data

Story

  • [CDF-713] - As a user, I'd like to be able to easily change the datasource used by a component in the preExec function.
For 5.x:

Release Notes - Community Dashboard Framework - Version 16.08.18

Bug

  • [CDF-449] - CCC - On timeseries charts, the zeroline of the base axis shows up on 1970
  • [CDF-452] - CCC - Treemap - specifying the colorMap option throws an error
  • [CDF-826] - BlockUI does not appear when a certain dashboard is embedding another one using dashboard component.
  • [CDF-865] - Dashboard require - Radio Button component default type "checkbox"
  • [CDF-871] - PrptComponent sample executes the components in an arbitrary order
  • [CDF-872] - Missing dependency in the CDF AMD broadcast sample
  • [CDF-895] - CCC - Stacked area chart has an incorrect behaviour when one of the series has null and not null values.
  • [CDF-912] - CCC - Axis tick label overflows - layout fails to take axis offset into account
  • [CDF-913] - CCC - Axis tick label overflows - ignores fixed or maximum axis sizes
  • [CDF-917] - CCC - Axis tick label overflows - fails on fixed categorical bands layout
  • [CDF-918] - CCC - Metric/Scatter chart - cannot set axis offset to 0
  • [CDF-919] - CCC - Axis tick label overflows - fails when OverlappedLabelsMode is "hide"

New Feature

  • [CDF-603] - CCC - Realtime - Sliding Window to cope with constantly incoming data

Release Notes - Community Data Access - Version 6.1-16.08.18

Bug

  • [CDA-183] - CDA File Editor is not working correctly
  • [CDA-188] - Using "security:principalRoles" as cache key creates a new entry on the cache every time the query is run

Release Notes - Community Graphics Generator - Version 6.1-16.08.18

Improvement

  • Upgraded to last CCC release

Viewing all 186 articles
Browse latest View live