Subsequent steps only recieve the last input row to this step.Ĭreate new fields by performing simple calculations. Serialize data into Avro binary or JSON format from the PDI data stream, then writes it to file.īlock this step until selected steps finish.īlock flow until all incoming rows have been processed. Generate documentation automatically based on input in the form of a list of transformations and jobs.ĭecode binary or JSON Avro data and extracts fields from the structure it defines, either from flat files or incoming fields. Publish messages in near-real-time to an AMQP broker. Pull streaming data from an AMQP broker or clients through an AMQP transformation. Each time value of at least one field change, PDI will reset sequence.Įncode several fields into an XML fragment.Įxecute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST). Explore the data source again in Pentaho:įor questions about the use of RowGen or its callability from third-party applications, email make sure you also saw our previous article on masking production data in Pentaho.Add a checksum column for each input row.Īdd one or more constants to the input rows.Īdd sequence depending of fields value change. Create the job with a Start step and use the Shell step to reference the RowGen batch file created above:Īfter the Pentaho/RowGen job is executed, you will see your tables populated with the test data. While you can certainly add the Shell step to a larger Pentaho project, I’m only showing the steps needed to run the test data generation job. The GUI’s New DB Test Data job wizard for RowGen will connect to the same tables, parse their DDL, and produce a data generation batch operation that will run in Pentaho’s Shell step: The next step is to build the test data using RowGen job scripts automatically created in the IRI Workbench GUI, built on Eclipse™. The Pentaho view of this stage setting is shown below: RowGen will rely on the DDL information to generate structurally and referentially correct test data soon. We’ll begin the example with empty tables to be populated. You would use the Shell step in Pentaho to call pre-defined RowGen jobs (or batch job) scripts. IRI RowGen software populates tables and flat files with benign test data for use in Pentaho and other applications. This becomes important when you want to prototype ETL operations, share new views or reports with co-workers, and develop new applications without relying on production data. While Pentaho Data Integration (PDI) has a number of database tools, it does not have the native capability to create safe, intelligent test data. By calling RowGen jobs from Pentaho, you can supply data with the structure and relationships needed for immediate ETL and BI testing, but not expose personally identifiable information.
#Pentaho data integration generate rows how to
We first demonstrate how to improve sorting performance, and then introduce ways to mask production data, and create test data, in the Pentaho Data Integration (PDI) environment.Ībstract: IRI RowGen generates safe, realistic test data for multiple database and file targets, according to business rules.
#Pentaho data integration generate rows series
This article is third in a 3-part series on using IRI products to expand functionality and improve performance in Pentaho systems.