Optimization
From UG
(→SOW 12) |
|||
Line 190: | Line 190: | ||
This can be rewritten to use "union" instead of "or". That gives substantial speedup: | This can be rewritten to use "union" instead of "or". That gives substantial speedup: | ||
- | + | select count(distinct C) as C | |
- | select count(distinct C) as C | + | from ( |
- | from ( | + | select gs.Id as C |
- | select gs.Id as C | + | from viewShipment gs |
- | from viewShipment gs | + | inner join tblShipmentLog sl on ( gs.Id=sl.ShipmentId ) |
- | inner join tblShipmentLog sl on ( gs.Id=sl.ShipmentId ) | + | where sl.FieldName = 'ClientCompany' |
- | where sl.FieldName = 'ClientCompany' | + | |
and sl.ChangeTime > '2012-01-28 23:01:04' | and sl.ChangeTime > '2012-01-28 23:01:04' | ||
and gs.IsDeleted = 0 | and gs.IsDeleted = 0 | ||
and( gs.IsApprovedByPlanner is null | and( gs.IsApprovedByPlanner is null | ||
- | + | or gs.IsApprovedByPlanner = 1 ) | |
and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701' | and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701' | ||
- | + | or gs.VisibilityClientCompanyId2 = '1701' or gs.VisibilityClientCompanyId3 = '1701' ) | |
and gs.Cancelled = 0 | and gs.Cancelled = 0 | ||
- | union all | + | union all |
- | select gs.Id as C | + | select gs.Id as C |
- | from viewShipment gs | + | from viewShipment gs |
- | where gs.CreatedOnDate > '2012-01-28 23:01:04' | + | where gs.CreatedOnDate > '2012-01-28 23:01:04' |
and gs.IsDeleted = 0 | and gs.IsDeleted = 0 | ||
and( gs.IsApprovedByPlanner is null | and( gs.IsApprovedByPlanner is null | ||
- | + | or gs.IsApprovedByPlanner = 1 ) | |
and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701' | and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701' | ||
- | + | or gs.VisibilityClientCompanyId2 = '1701' or gs.VisibilityClientCompanyId3 = '1701' ) | |
and gs.Cancelled = 0 | and gs.Cancelled = 0 | ||
Revision as of 14:20, 30 May 2012
Contents |
Info
- parent mantis: 0000028: [Optimization] .... parent
- mantis category: Optimization
Requirements
Keep system speed/performance at the normal levels NOW and IN THE FUTURE.
It is assumed that to satisfy above we have to:
- Optimize existing code, DB, etc to
- Develop strategy for the future development
[Perry: 2/22/2012 comments/ideas regarding approach and strategy]
- Separate web/app server from database server.
- Introduce scheduled batch reporting features for processor/memory intensive reports
- Develop plan to proactively monitor server utilization (Evaluate available tools - I know BMC Software has tools to report on various data points across a specified period of time)
- Look for MySQL and Tomcat tools to monitor SQL statements and long running processes. Evaluate database explain plans - optimize SQL statements and introduce indexes where necessary.
SOW 0
- 0001357: (Misc) Create CT2 2015 DB (Ops/Acc modules) Test speed, optimize (requirements for how fast each CT2 table will grow attached)
For every table: a) record how many records are now b) find out (interview BAs) - how many records will be there at the end of 2010 c) estimate rate of growth per year after 2010 d) post this info into http://ct.jaguarfreight.com/wiki/Intro_into_CT2_DB [^] [^] Based on the info above CT2 Architect will: a) create database similar in size to dec 2010 levels b) estimate speed of application c) compare above with target d) optimize DB and algorithms to reach the target speed
SOW 1
* 0002233: (CT2 Misc) Ongoing DB optimization (SOW 1)
Implementation Notes SOW 1
- P/l for mult cts report refactoring
- Billing required report refactoring
- P/l report for CT group and for Master - lines from one PI for different cts was grouped into one line per Marc's request
SOW 2
- 0003312: (CT2 Misc) Ongoing DB optimization (SOW 2)
Implementation Notes SOW 2
1. Refactored ChargeCodes to ShipmentInvoices relation table (direct access without using tblTableOfCharges table). 2. Reworked ChargeCode persistence methods to speed-up Invoices access. Was improved speed of reports generation report p&l for multiple cts "Special View: Charge Code Group" optimized. this report now works 10-20 times faster
SOW 3
Implementation Notes SOW 3
- Refactor IsBasicApproval field (Boolean) to PlannerType field. Will simplify Queries logic.
- Migrate current Planner Shipment Authorization Status fields to tblGenericShipment table
- Avoid joining pair of long tables tblGenericShipment and tblShipmentAuthorizationStatusHistory
- improve performance of Planner/Shipper portal and Reporting
- Refactored Query class
- removed quotation marks of Integer Parameter in Queries composition. As Vlad noted, comparing Integer to Character values has significant negative effect on performance.
SOW 4
Kostya: I need to add indexes and constraints to PDF dosc tables - for individual cts and masters. To implement this I should drop some "garbage" records from these several tables so I need time to review existing records and remove all that not have links to shipments that correnctly database has. Only after this I can add needed constraints to the tables. This indexes and constraints will improve database integrity and force speed.
SOW 5 Re-architect EDI interface
- 0003523: Re-architect the interface: connect problem between CT2 & FTP Server for TMS, Air Status & EUAN BP.(related: 0003479)
Main Objective: Prevent a single FTP thread from indefinitely blocking other FTP threads due to any error condition (either connection problem, transmission problem, etc..). This applies to all FTP interfaces including TMS, Trendset, etc..
Proposed Solution:
- Add timeout to all FTP processes to return control back to the application after a predetermined timeout value. Timeout value should not be hard-coded, but either config files or database driven.
- Modify existing FTP processes that delete and then copy a file from/to the same server to issue rename/move commands instead. Waste a lot of bandwidth by deleting and then copying the same file over the same server.
- FTP process currently attempts to create directory to store invalid files for each session. Modify process to only issue this command after initial environment build. After an environment is built, then do not issue this command for every FTP session.
- (Nice-to-have and will not need to implement in this SOW) Modify process to not open/close FTP connection for each and every file. Keep an FTP session open for a set time period to process all requests for that cycle, and then close the connection.
SOW 6
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003584: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 6)
1. blPDFReports is missing index on ShipmentId This causes occasional full rescan of a huge table
SOW 7
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003585: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 7)
2.
- Query_time: 6 Lock_time: 0 Rows_sent: 0 Rows_examined: 2075813
INSERT INTO tblTempShipment0 (Id,ShipmentGroupId) (SELECT gs.Id, gs.ShipmentGroupId FROM viewShipment gs LEFT JOIN tblShipmentGroup sg ON gs.ShipmentGroupId=sg.Id WHERE gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND gs.IsLossAuthorized=0 GROUP BY gs.Id)
The statement depends on more than 90% of shipments in the database. Even there are about 90K shipments in the database, the statement touches about 2M.
As database grows the statement will take more and more to execute.
SOW 8
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003586: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 8)
3.
- Query_time: 6 Lock_time: 0 Rows_sent: 1 Rows_examined: 4465332
SELECT COUNT(gs.Id) AS C FROM viewShipment gs WHERE (gs.TransportModeId BETWEEN 2 AND 9) AND gs.ActualPickupDate='0000-00-00' AND ((NOT gs.EstimatedPickupDate='0000-00-00' AND gs.EstimatedPickupDate<'2012-01-23') OR (gs.EstimatedPickupDate BETWEEN '2012-01-23' AND '2012-01-23' + INTERVAL 7 DAY)) AND gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND ((gs.TransportModeId='2' OR gs.TransportModeId ='3' OR gs.TransportModeId='4' OR gs.TransportModeId='5' OR gs.TransportModeId='6' OR gs.TransportModeId='8' OR gs.TransportModeId='9')) AND gs.Cancelled=0;
This query attempts to limit amount of data using 2012-01-23 cut off. However design of viewShipment prohibits that. Query touches 4.5M rows. The query will get slower as shipments are added to the database.
SOW 9
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003587: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 9)
4. Design of viewShipment
It is complex, it seems that mySQL optimizer is often unable to understand nature of queries ran against this view. Then it falls back to conservative execution plan. Which causes all shipments to be touched and all extension tables. Which results in millions rows being accessed.
Perhaps more narrow view design (less joins and some unions)less problematic for the optimizer.
SOW 10
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003588: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 10)
5. Design of viewShipment
Various timestamps in the view are calculated in a very complex manner ("case" expressions with functions in them). These timestamps could be very useful in limiting scope of the query (selecting only recent shipments, etc). However, even industry leading databases are generally unable deal with such cryptic column definitions. These columns needs to be indexed in such a way, that MySQL optimizer is able to make use of them.
Another way to look at this: If we look at a single shipment we either have complexity of maintaining explicit timestamps on each update of the shipment, but have less complex reports. Or simplify updates, but deal with complexity in reports. Which looks as if there was parity. There isn't in my view. Once shipment is completed it adds to complexity indefinitely into the future. We have much more completed shipments, than those in progress. Complexity should be where volume is lower - shipments updates.
SOW 11
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003589: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 11)
6. Design of viewShipment
Driving table in the view is tblGenericShipment. It's preferable that other tables were joined to the driving table on their primary keys. Currently it's not the case. Very important tables tblOceanShipment, tblAirShipment, tblTruckingShipment all have duplicates(up to 6 fold) in them. This alone requires "distinct" added to some queries, which makes them slower.
When optimizer doesn't see that join condition necessarily returns at most one dependent row for one driving table row, it can choose suboptimal execution plan.
Besides tblOceanShipment, tblAirShipment, tblTruckingShipment following tables also need to change their primary key: tblShipment2ClientCompanyGroupRelation, tblShipment2ClientCompanyRelation
Possibly there are more such tables.
SOW 12
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003590: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 12)
7.
- Time: 120129 14:01:16
- User@Host: root[root] @ [127.0.0.1]
- Query_time: 62 Lock_time: 0 Rows_sent: 1 Rows_examined: 85612
SELECT COUNT(gs.Id) AS C FROM viewShipment gs WHERE (gs.CreatedOnDate>'2012-01-28 23:01:04' OR gs.Id IN (SELECT ShipmentId FROM tblShipmentLog WHERE FieldName='ClientCompany' AND ChangeTime>'2012-01-28 23:01:04')) AND gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND (((gs.ClientCompanyId='1701' OR gs.VisibilityClientCompanyId1='1701' OR gs.VisibilityClientCompanyId2='1701' OR gs.VisibilityClientCompanyId3='1701'))) AND gs.Cancelled=0;
This can be rewritten to use "union" instead of "or". That gives substantial speedup:
select count(distinct C) as C from ( select gs.Id as C from viewShipment gs inner join tblShipmentLog sl on ( gs.Id=sl.ShipmentId ) where sl.FieldName = 'ClientCompany' and sl.ChangeTime > '2012-01-28 23:01:04' and gs.IsDeleted = 0 and( gs.IsApprovedByPlanner is null or gs.IsApprovedByPlanner = 1 ) and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701' or gs.VisibilityClientCompanyId2 = '1701' or gs.VisibilityClientCompanyId3 = '1701' ) and gs.Cancelled = 0 union all select gs.Id as C from viewShipment gs where gs.CreatedOnDate > '2012-01-28 23:01:04' and gs.IsDeleted = 0 and( gs.IsApprovedByPlanner is null or gs.IsApprovedByPlanner = 1 ) and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701' or gs.VisibilityClientCompanyId2 = '1701' or gs.VisibilityClientCompanyId3 = '1701' ) and gs.Cancelled = 0
Or even better. Instead of looking into both "CreatedOnDate" field and "tblShipmentLog" table, only one criterion could be used, by either
a) tblShipmentLog containing shipment creation record with FieldName='ALL created', or
b) UpdatedOnDate field be maintained, updated on shipment creation and then on each operation that results in population of tblShipmentLog (can be done initially with a trigger).
b)
SOW 13
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003591: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 13)
SOW 14
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003592: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 14)
SOW 15
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003593: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 15)
SOW 16
Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
- 0003594: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 16)