Reports¶

Reports are defined as json objects that must define these fields:

name: this name will be used to execute the report through cli.
type: report’s type. Supported report types are defined below.
results: list of results configuration that define how to save the reports (Results documentation).
Set of required or optional fields that are detailed below.

File¶

type: file. This report reads the data from local file. These are the configurations:

filename: path to the file. Laika parses the file based on it’s extension. csv and tsv files are parsed out of the box. To parse excel files, you need to install laika-lib[excel] dependency.
raw: if this parameter is set to true, file’s extension will be ignored and file contents will be passed to result unparsed.

Example of a file report:

{
  "name": "my_file_report",
  "type": "file",
  "filename": "/path/to/filename.xlsx",
  "results": [...]
}

Query¶

Note

To use query report you must install sql dependency (for Sqlalchemy): pip install laika-lib[sql]. You also have the libraries needed to access your specific database.

For postgres: pip install laika-lib[postgres]

For Presto(Pyhive): pip install laika-lib[presto]

We only tested it with postgres and Presto, but it should work with all databases supported by SQLAlchemy.

type: query. This report runs a query to some database. Should work with any database supported by Sqlalchemy but right now it’s only tested with PostgreSQL and Presto. These are the configurations:

query_file: path to a file that contains plane sql code.
connection: name of the connection to use.
variables: A dictionary with values to replace in query code. You can find further explanation in Query templating.

Example of a query report:

{
  "name": "my_shiny_query",
  "type": "query",
  "query_file": "/my/dir/query.sql",
  "connection": "local",
  "variables": {
    "foo": "bar"
  },
  "results": [...]
}

Bash Script¶

type: bash. This report executes a bash command and reads it’s output. You can interpret this report as the read_output part of this example:

$ bash_script | read_output

These are the configurations:

script: command to execute
script_file: path to the file with bash script to execute. If script is defined, this field will be ignored.
result_type: type of output data format. Can be csv, json or raw. In case of raw, the content will not be converted and will be passed as is to the result. The explanation of json format is explained below.

Example bash script report:

{
  "name": "some_bash_script",
  "type": "bash",
  "script_file": "some_script.sh",
  "result_type": "json",
  "results": [...]
}

Bash Script json format¶

Json data will be converted to a pandas dataframe using pd.read_json function (Docs). These are some examples of the formats it accept:

Example 1 (all arrays must have the same size):

{
  "column_1": ["data_row_1", "data_row_2", "data_row_3"],
  "column_2": ["data_row_1", "data_row_2", "data_row_3"],
  ...
}

Example 2:

[
  {
    "column_1": "data_row_1",
    "column_2": "data_row_1",
    "column_3": "data_row_1",
  },
  {
    "column_1": "data_row_2",
    "column_3": "data_row_2"
  }
  ...
]

Download From Google Drive¶

Note

To use drive report you must install drive dependency: pip install laika-lib[drive]

type: drive. This report downloads a file from Google Drive. It uses file parsing logic from the File report.

Configuration:

profile: Name of the profile to use. Credentials must be ones of a service account with access to Google Drive’s API.
grant: email of a grant account, in the name of which the document will be downloaded. Grant must have access to specified folder.
filename: name of the file to download.
folder: directory in which the report will search for the specified file.
folder_id: google drive’s id of the above folder. If specified, folder option is ignored. It’s useful if there is more than one folder with the same name.
subfolder: optional, if specified, this report will look for a subfolder inside a folder and, if found, will look there for filename. In other words, it will look for this structure: <folder>/<subfolder>/<filename>
file_id: google drive’s id of the file to download. If specified, all other file options are ignored.
start_timeout, max_timeout, retry_status_codes: drive API calls sometimes fail with 500 errors. To work around this behaviour, in case of error the call is retried after waiting start_timeout (2 by default) seconds, doubling the waiting time after each error until reaching max_timeout (300 by default). If the error persists after that, the exception will be raised. retry_status_codes is a list of extra status codes to retry after, [429] by default (429 is “too many requests”).

Example of a drive report:

{
  "type": "drive",
  "profile": "drive_service_account",
  "grant": "me@mail.com",
  "folder_id": "my_folder_id",
  "folder": "TestFolder",
  "subfolder": "TestSubFolder",
  "file_id": "my_file_id",
  "filename":"file_to_download.xlsx"
}

Download From S3¶

Note

To use S3 report you must install s3 dependency: pip install laika-lib[s3]

type: s3. This report downloads a file from Amazon S3. It uses file parsing logic from the File report. In order to use this report, you have to install boto3.

Configuration:

profile: Name of profile to use (laika profile, no to confuse with aws profiles). Credentials file of the specified profile must contain data to be passed to Session constructor. Example of a minimal aws credentials file for laika:

json { "aws_access_key_id": "my key id", "aws_secret_access_key": "my secret access key" }

bucket: s3 bucket to download the file from.
filename: File to download. This config is the key of the file in bucket.

Example of a s3 report:

{
  "name": "s3_report_example",
  "type": "s3",
  "profile": "my_aws_profile",
  "bucket": "some.bucket",
  "filename": "reports/custom_report.csv",
  "results": [...]
}

Redash¶

type: redash. This report downloads query result from redash API. These are the configurations:

redash_url: the url of your redash instance.
query_id: id of the query to download. You can get from the query’s url, it’s last part is the id (for example, for https://some.redash.com/queries/67, 67 is the id).
api_key: token to access the query, either for user or for query. You can find user’s token in the profile, token for query can be found in the source page.
refresh: True if you want an updated report. Important: For refresh to work the api_key must be of user type.
parameters: Dictionary of query parameters. They should be written as they are defined in the query. The p_ needed for the url will be prepended in the report.

Example of a redash query:

{
  "name": "some_redash_query",
  "type": "redash",
  "api_key": "some api key",
  "query_id": "123",
  "redash_url": "https://some.redash.com",
  "refresh": true,
  "parameters": {
      "hello": "world"
  },
  "results": [...]
}

Adwords¶

Note

To use adwords report you must install adwords dependency: pip install laika-lib[adwords]

type: adwords. This report is generated by Google Adwords API. To use it, you will need to install googleads. The configurations are:

profile: Name of profile to use. Credentials file is a .yaml, you can find out how to generate it in adwords API tutorial.
report_definition: the definition of the report which will be passed to DownloadReport method of googleads API. You will normally define fields reportName, dateRangeType, reportType, downloadFormat, selector, but these will vary depending on the report type.
reportName: In order not to repeat reports definitions, you can specify this name and reuse the definition. In other words, you can have multiple reports with the same name, but only one report_definition, which will be used for all of them.
dateRangeType: if you use report_definition from another report, you can overwrite date range it uses with this configuration. Here you can read more about date range types you can chose from.
date_range: if dateRangeType is set to CUSTOM_DATE, you can define a custom range of dates to extract. The definition must be a dictionary with min and max values. In both you can use relative dates with Filenames templating.
client_customer_id. Id or list of ids of adwords customers, whose data you want in the report.

Example of adwords query:

{
  "name": "some_adwords_report",
  "type": "adwords",
  "date_range": {"min": "{Y-1d}{m-1d}{d-1d}", "max": "{Y-1d}{m-1d}{d-1d}"},
  "client_customer_ids": "123-456-7890",
  "report_definition": {
    "reportName": "Shopping Performance Last Month",
    "dateRangeType": "CUSTOM_DATE",
    "reportType": "SHOPPING_PERFORMANCE_REPORT",
    "downloadFormat": "CSV",
    "selector": {
        "fields": [
            "AccountDescriptiveName",
            "CampaignId",
            "CampaignName",
            "AdGroupName",
            "AdGroupId",
            "Clicks",
            "Impressions",
            "AverageCpc",
            "Cost",
            "ConvertedClicks",
            "CrossDeviceConversions",
            "SearchImpressionShare",
            "SearchClickShare",
            "CustomAttribute1",
            "CustomAttribute2",
            "Brand"
        ]
    }
  },
  "results": [...]
}

Facebook Insights¶

type: facebook. Retrieves the data from the Facebook’s Insights API. The report is requested as asynchronous job and is polled for completion every few seconds.

Configuration:

profile: Name of profile to use. Credentials file must contain access token with at least read_insights permission. You can generate it in Facebook’s developers panel for you app. Example facebook credentials:

{
  "access_token": "..."
}

object_id: Facebook’s object id from which you want to obtain the data.
params: Set of parameters that will be added to the request. Check the example report to know what values are used by default, consult Facebook’s Insights API documentation to discover what parameters you can use.
sleep_per_tick: Number of seconds to wait between requests to Facebook API to check if the job is finished.
since: Starting date for a custom date range. Will only be used if date_preset, time_range or time_ranges are not present among report parameters. You can set relative dates using Filenames templating.
until: Same as since, but for the ending date.

Example of facebook report:

{
    "name": "my_facebook_insights_report",
    "type": "facebook",
    "profile": "my_facebook_profile",
    "object_id": "foo_1234567890123456",
    "since": "{Y-1d}-{m-1d}-{d-1d}",
    "until": "{Y-1d}-{m-1d}-{d-1d}",
    "params": {
        'level': 'ad',
        'limit': 10000000,
        'filtering': '[{"operator": "NOT_IN", "field": "ad.effective_status", "value": ["DELETED"]}]',
        'fields': 'impressions,reach',
        'action_attribution_windows': '28d_click'
    },
    "results": [...]
}

RTBHouse¶

Note

To use rtbhouse report you must install rtbhouse dependency: pip install laika-lib[rtbhouse]

type: rtbhouse. Downloads marketing costs report from RTBHouse API. Reported campaigns (advertisers) are all those created by the account.

Configuration:

profile: Name of profile to use. Credentials must be a json containing username and password fields.
day_from: Starting date for the period to retrieve costs for. You can set a relative date using Filenames templating.
day_to: Same as day_from, but for the ending date.
group_by and convention_type: Optional parameters to send to RTBHouse.
campaign_names: Mapping from campaign hash to a readable name for the resulting report.
column_names: Mapping to rename columns in the resulting report.

Example of rtbhouse report:

{
  "name": "my_rtbhouse_report",
  "type": "rtbhouse",
  "profile": "my_rtbhouse_profile",
  "group_by": "day",
  "convention_type": "ATTRIBUTED",
  "day_from": "{Y-1d}-{m-1d}-{d-1d}",
  "day_to": "{Y-1d}-{m-1d}-{d-1d}",
  "campaign_names": {
    "1234567890": "Some readable campaign name"
  },
  "column_names": {
    "hash": "CampaignID",
    "name": "Campaign",
    "campaignCost": "Cost",
    "day": "Date"
  },
  "results": [...]
}

Rakuten¶

type: rakuten. Downloads a report from Rakuten marketing platform by name.

Configuration:

profile: Name of profile to use. Credentials must be a json containing token key, with a token to access Rakuten API.
report_name: Existing report to download from the platform.
filters: A set of filters to send to the API. Must be a dictionay, you can use Filenames templating on the values.

Example of rakuten report:

{
  "name": "my_rakuten_report",
  "type": "rakuten",
  "profile": "my_rakuten_profile",
  "report_name": "some-report",
  "filters": {
    "start_date": "{Y-10d}-{m-10d}-{d-10d}",
    "end_date": "{Y-1d}-{m-1d}-{d-1d}",
    "include_summary": "N",
    "date_type": "transaction"
  }
}

Module¶

type: module. Allows you to use a python module with custom report class to obtain the data. This module will be loaded dynamically and executed. Currently it has the same configuration as the module result, which can be confusing.

Configuration:

result_file: Path to python file.

result_class: Name of the class to use as result inside the python file. This class must inherit Report class and define process method, which should normally return report data. Simple example of a custom report class:

from laika.reports import Report

class FooResult(Report):

    def process(self):
        # using some custom configs
        filename = self.custom_filename
        # returning some data
        with open(filename) as f:
            return do_stuff(f.read())

This report will be executed as any other report - it will have available all the extra configuration you define.

Warning

This report will load and execute arbitrary code, which implies a series of security holes. Always check custom modules before using them.

Example of a module report definition:

{
  "type": "module",
  "result_file": "./some_folder/my_custom_report.py",
  "result_class": "MyReport",
  "my_custom_config": "value"
}