пятница, 19 сентября 2014 г.

Microstrategy and Teradata Best Practices for Performance Tuning


Distributing Data Across AMPs
Teradata is parallel database system: the data is distributed across a set of nodes, commonly called AMPs, that process queries in parallel, with each AMP doing a portion of the work. Each table in the database is distributed by rows across all AMPs in the system. Every table is created with a primary index that defines how the data will be distributed across AMPs. Even if a primary index is not explicitly defined, Teradata will assume that the first column in the table should be used for data distribution. Teradata uses the columns in the primary index and applies a hash function to map each row of the table to an AMP in the system.
A critical factor in selecting a primary index is to ensure even distribution of data across AMPs. Teradata’s parallel processing capabilities are most efficient when rows are distributed evenly. The primary index is also relevant for data access and joins. Faster queries are possible if they include all rows in the primary index; direct joins with other tables can be faster if the join occurs on the columns of the primary indexes of the tables. Selecting the best primary indexes is one of the most important physical design decisions in a Teradata system.
For a MicroStrategy system in particular, data distribution is important in two ways. Like any SQL application, MicroStrategy transparently takes advantage of primary indexes defined on tables. So good data distribution is important to MicroStrategy applications simply because it is a key component of overall system performance, i.e. MicroStrategy query response time is very dependent on the primary indexes selected. Second, because MicroStrategy sometimes creates intermediate tables, we should be also aware of how the data for those tables is being distributed – it can make a difference in the response time for MicroStrategy reports that create intermediate tables. See more discussion below under “VLDB Optimizations.”
The Teradata Database Design manual is a good source for additional detail on primary indexes and distributing data across AMPs. 

Using Partitioned Primary Indexes
Teradata introduced partitioned primary indexes (PPIs), a new indexing mechanism used in physical database design. When using PPI’s, rows within each AMP are partitioned and then sorted by their row hash value within each partition. The net effect is that queries can run faster by accessing only the rows of qualifying partitions.
PPI’s are especially helpful for queries based on range access, such as date ranges. A common scenario is to partition by something in the Time dimension, for example at a daily, weekly, or monthly level.
In general. PPI’s make sense when the primary index has been selected for its data distribution properties, but not as much for joins and data access. If the PI is used often for joins and data access, then it is better if the columns used in the partitioning expression are part of the primary index. See the Teradata Database Design manual for more detailed discussion of when to use PPI’s.
MicroStrategy-generated queries transparently take advantage of partitioned primary indexes defined on any tables defined as part of the MicroStrategy schema. PPI’s are not allowed for volatile tables or global temporary tables, so the SQL Engine does not generate partitioned primary indexes when creating temporary tables. Primary indexes on temporary tables provide sufficient control for performance tuning.
Keeping Statistics Up To Date
One of the most important sources of information the Teradata optimizer uses for choosing access plans is the set of statistics Teradata collects about the system and the data in it. Database statistics include  “data demographics” such as the number of rows in each table, the number of unique values within each column, the skew and distribution of values within each column, etc. The optimizer uses this information when selecting the best access plan from among all the possible access plans that could satisfy a given query.
It is important for the statistics to be updated as frequently as practical. Whenever new data is loaded or new indexes are built or any other significant change occurs to data, the statistics should be updated to reflect the change. Note that out-of-date statistics can be worse than not collecting statistics at all.
In Teradata, statistics are particularly important because the optimizer does not allow the user to “override” its decisions through constructs like hints or rewritten SQL. The optimizer will select the best plan for a given query, but its decisions are only as good as the information it uses to make them.
Teradata statistics are fairly transparent to a MicroStrategy application, since the optimizer is their main consumer. However, keeping statistics up to date is a critical factor in overall system performance, i.e. MicroStrategy queries will be affected by out of date statistics. In addition, some MicroStrategy reports create intermediate tables and insert records into them, so it can be important to collect statistics on these intermediate tables as well. See the VLDB Optimizations section for details on how to do this.
Planning a pre-aggregation strategy and using Aggregate Join Indexes
Pre-aggregating data is a common and powerful way to improve end user query response times. Rather than aggregating many, many rows at query time, pre-summarization allows the database to perform row access and aggregation ahead of time, satisfying the query at request time much faster. Data warehouse practitioners have long used physical summary tables to pre-aggregate data in a ROLAP environment. MicroStrategy is an aggregate aware application that allows end users to query physical summary tables without needing to specify which table to use to satisfy the query.
Teradata also supports aggregate join indexes, a database server feature that provides for the creation, maintenance, and automatic navigation of aggregate tables or pre-joined tables. This makes the use of aggregate tables entirely transparent to the end user and to the MicroStrategy developer. With aggregate join indexes, Teradata will rewrite the query to use summary or pre-joined tables, provided that the optimizer determines its cost to be less than the original query. Query rewrite in Teradata is based on the SQL text itself or on rewrite conditions explicitly determined by Teradata.
Any MicroStrategy generated SQL will be submitted to Teradata and will be available for query rewrite. Provided that the proper conditions are set up on Teradata, MicroStrategy will transparently take advantage of aggregate join indexes. There are some cases in which physical tables will be used more often than aggregate join indexes. MicroStrategy can take advantage of its knowledge of hierarchical relationships to select from physical summary tables. For example, suppose a base table exists at the Day level and a user submits a query at the Year level. If a physical table is created at the Month level, the MicroStrategy application will access that table to fulfill the Year level query. If an equivalent aggregate join index is built at the Month level, the Year level query will not be rewritten to use the join index.