Accessing Databases

Accessing Databases


In this video, we’ll explain how to connect
to a database in a KNIME workflow. To connect to a database we need a connector
node that creates the connection to the database via its JDBC driver. The KNIME Database Extension provides a setof nodes for connecting to all JDBC compliant databases. The nodes are a part of KNIME Analytics Platform
installation, so no additional installation is necessary. Many file-based and server-based databases, such as SQLite and MySQL, have dedicated connector nodes. These connector nodes already contain the
necessary JDBC drivers and provide configuration options that are tailored to the specific
database. Dedicated connector nodes for big data platforms,
such as Apache Hive or Impala, can be obtained by installing the KNIME Big Data Extension
which is also open source. If no dedicated connector node exists for
your database, you can use the generic DB Connector node. Let’s have a look at an example workflow
where we connect to a database. Here we want to read data from a MySQL database,
so we can use the MySQL Connector node. In the configuration dialog of the MySQL Connector
node, in this first dropdown menu here, we can change the database dialect. In the second dropdown menu, we can change
the JDBC driver. Here below, we define the hostname and port
where the database resides. Next, we enter the name of the database that
we want to access. In the authentication section, we provide
the credentials to access this database. One option is to provide the username and
password here in this field. However, since entering and storing a username
and password directly in the dialog might lead to security issues, KNIME allows you
to provide credentials from outside the node, via a flow variable connection coming from
the Credentials Configuration node. The available credentials are shown in this
menu. The other tabs are for advanced, optional
settings. In the JDBC Parameters tab we can add custom
JDBC driver connection parameters. In the Advanced tab we can configure more
settings, such as the query for validating the connection. In the Input Type Mapping and Output Type
Mapping tabs we can define which data types in KNIME Analytics Platform correspond to
the data types associated with our database. However, the settings that we already defined
in the Connection Settings tab are often enough to establish the connection – so we click
“OK”. We can now access the MySQL database. The output of the node shows some information
about the connection that we just created. Notice that the connection between the dedicated
connector node and the JDBC driver does not exist automatically for MS SQL Server and
Amazon Redshift databases due to licensing restrictions. Their JDBC drivers have to be installed separately
as KNIME extensions. After that, the dedicated connector node will
automatically connect to the appropriate JDBC driver and provide it in the configuration
dialog. For databases that don’t have dedicated
connector nodes, we can use the generic DB Connector node which can connect to arbitrary
JDBC compliant databases. In such a case, it is necessary to register the corresponding JDBC driver in KNIME Analytics Platform. Let’s say we want to connect to a SAP HANA
database that doesn’t have a dedicated connector node. We can register its JDBC driver in the dialog
that opens if we go to File → Preferences → KNIME → Databases. We click “Add”, and in the dialog that
opens, we provide a unique ID and name of the JDBC driver. In this menu on the right, we define the database
type, which is SAP HANA in our case. Our database type is not listed in the menu,
so we select the “default” option. In this field below, we specify the URL template. We can click this question mark to see the
available options for creating the URL template. Finally, we provide the path to the JDBC driver
in our system. The driver can be downloaded from the website
of the database provider. We click “OK”, and as you can see, the
newly added driver appears in the list. Now, let’s open the configuration dialog of
the DB Connector node. It looks much the same as the configuration
dialog of the MySQL Connector node. We select the newly registered SAP HANA driver
here in the “Driver Name” menu. There are other settings we define as well,
just as when we connected to the MySQL database. Thus the connection to the SAP HANA database
has been created successfully. Now that we are connected to a database, we
could use the DB Table Selector node to select a table in the database, and continue with
many in-database manipulation operations using other DB nodes. At any point, we could also import the data
back into KNIME using the DB Reader node. In this video, we have explained how to connect
to a database using KNIME DB nodes. We have shown how to connect to a database
via its dedicated connector node, and how to connect to any JDBC compliant database
in general via the DB Connector node. We have also explained how to register a new
JDBC driver into KNIME Analytics Platform.