Optimization

From UG

(Difference between revisions)
Jump to: navigation, search
(SOW 5 Re-architect EDI interface)
(SOW 16)
 
(14 intermediate revisions not shown)
Line 95: Line 95:
== SOW 6 ==
== SOW 6 ==
-
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages. This task has 11 separate points  Vlad makes for how to resolve these processing and Java.SQL messages.
+
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
*'''0003584: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 6)'''
*'''0003584: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 6)'''
-
1. tblPDFReports is missing index on ShipmentId
+
1.  
-
This causes occasional full rescan of a huge table  
+
 
 +
blPDFReports is missing index on ShipmentId
 +
This causes occasional full rescan of a huge table
== SOW 7 ==
== SOW 7 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
*'''0003585: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 7)'''
*'''0003585: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 7)'''
Line 115: Line 119:
== SOW 8 ==
== SOW 8 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
*'''0003586: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 8)'''
*'''0003586: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 8)'''
Line 121: Line 127:
# Query_time: 6 Lock_time: 0 Rows_sent: 1 Rows_examined: 4465332
# Query_time: 6 Lock_time: 0 Rows_sent: 1 Rows_examined: 4465332
SELECT COUNT(gs.Id) AS C FROM viewShipment gs WHERE (gs.TransportModeId BETWEEN 2 AND 9) AND gs.ActualPickupDate='0000-00-00' AND ((NOT gs.EstimatedPickupDate='0000-00-00' AND gs.EstimatedPickupDate<'2012-01-23') OR
SELECT COUNT(gs.Id) AS C FROM viewShipment gs WHERE (gs.TransportModeId BETWEEN 2 AND 9) AND gs.ActualPickupDate='0000-00-00' AND ((NOT gs.EstimatedPickupDate='0000-00-00' AND gs.EstimatedPickupDate<'2012-01-23') OR
-
(gs.EstimatedPickupDate BETWEEN '2012-01-23' AND '2012-01-23' + INTERVAL 7 DAY)) AND gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND ((gs.TransportModeId='2' OR gs.TransportModeId
+
(gs.EstimatedPickupDate BETWEEN '2012-01-23' AND '2012-01-23' + INTERVAL 7 DAY)) AND gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND ((gs.TransportModeId='2' OR gs.TransportModeId
='3' OR gs.TransportModeId='4' OR gs.TransportModeId='5' OR gs.TransportModeId='6' OR gs.TransportModeId='8' OR gs.TransportModeId='9')) AND gs.Cancelled=0;
='3' OR gs.TransportModeId='4' OR gs.TransportModeId='5' OR gs.TransportModeId='6' OR gs.TransportModeId='8' OR gs.TransportModeId='9')) AND gs.Cancelled=0;
This query attempts to limit amount of data using 2012-01-23 cut off. However design of viewShipment prohibits that. Query touches 4.5M rows. The query will get slower as shipments are added to the database.
This query attempts to limit amount of data using 2012-01-23 cut off. However design of viewShipment prohibits that. Query touches 4.5M rows. The query will get slower as shipments are added to the database.
 +
 +
== SOW 9 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
 +
 +
*'''0003587: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 9)'''
 +
 +
4.
 +
 +
Design of viewShipment
 +
 +
It is complex, it seems that mySQL optimizer is often unable to understand nature of queries ran against this view. Then it falls back to conservative execution plan. Which causes all shipments to be touched and all extension tables. Which results in millions rows being accessed.
 +
 +
Perhaps more narrow view design (less joins and some unions)less problematic for the optimizer.
 +
 +
== SOW 10 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
 +
 +
*'''0003588: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 10)'''
 +
 +
5.
 +
 +
Design of viewShipment
 +
 +
Various timestamps in the view are calculated in a very complex manner ("case" expressions with functions in them). These timestamps could be very useful in limiting scope of the query (selecting only recent shipments, etc). However, even industry leading databases are generally unable deal with such cryptic column definitions.
 +
These columns needs to be indexed in such a way, that MySQL optimizer is able to make use of them.
 +
 +
Another way to look at this:
 +
If we look at a single shipment we either have complexity of maintaining explicit timestamps on each update of the shipment, but have less complex reports. Or simplify updates, but deal with complexity in reports.
 +
Which looks as if there was parity. There isn't in my view. Once shipment is completed it adds to complexity indefinitely into the future. We have much more completed shipments, than those in progress. Complexity should be where volume is lower - shipments updates.
 +
 +
== SOW 11 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
 +
 +
*'''0003589: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 11)'''
 +
 +
6.
 +
Design of viewShipment
 +
 +
Driving table in the view is tblGenericShipment. It's preferable that other tables were joined to the driving table on their primary keys. Currently it's not the case. Very important tables tblOceanShipment, tblAirShipment, tblTruckingShipment all have duplicates(up to 6 fold) in them. This alone requires "distinct" added to some queries, which makes them slower.
 +
 +
When optimizer doesn't see that join condition necessarily returns at most one dependent row for one driving table row, it can choose suboptimal execution plan.
 +
 +
Besides tblOceanShipment, tblAirShipment, tblTruckingShipment following tables also need to change their primary key:
 +
tblShipment2ClientCompanyGroupRelation, tblShipment2ClientCompanyRelation
 +
 +
Possibly there are more such tables.
 +
 +
== SOW 12 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
 +
 +
*'''0003590: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 12)'''
 +
 +
7.
 +
# Time: 120129 14:01:16
 +
# User@Host: root[root] @ [127.0.0.1]
 +
# Query_time: 62 Lock_time: 0 Rows_sent: 1 Rows_examined: 85612
 +
SELECT COUNT(gs.Id) AS C FROM viewShipment gs WHERE (gs.CreatedOnDate>'2012-01-28 23:01:04' OR gs.Id IN (SELECT ShipmentId FROM tblShipmentLog WHERE FieldName='ClientCompany' AND ChangeTime>'2012-01-28 23:01:04')) AND gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND (((gs.ClientCompanyId='1701' OR gs.VisibilityClientCompanyId1='1701' OR gs.VisibilityClientCompanyId2='1701' OR gs.VisibilityClientCompanyId3='1701'))) AND gs.Cancelled=0;
 +
 +
This can be rewritten to use "union" instead of "or". That gives substantial speedup:
 +
select count(distinct C) as C
 +
from (
 +
select gs.Id as C
 +
from viewShipment gs
 +
inner join tblShipmentLog sl on ( gs.Id=sl.ShipmentId )
 +
where sl.FieldName = 'ClientCompany'
 +
and sl.ChangeTime > '2012-01-28 23:01:04'
 +
and gs.IsDeleted = 0
 +
and( gs.IsApprovedByPlanner is null
 +
or gs.IsApprovedByPlanner = 1 )
 +
and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701'
 +
or gs.VisibilityClientCompanyId2 = '1701' or gs.VisibilityClientCompanyId3 = '1701' )
 +
and gs.Cancelled = 0
 +
union all
 +
select gs.Id as C
 +
from viewShipment gs
 +
where gs.CreatedOnDate > '2012-01-28 23:01:04'
 +
and gs.IsDeleted = 0
 +
and( gs.IsApprovedByPlanner is null
 +
or gs.IsApprovedByPlanner = 1 )
 +
and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701'
 +
or gs.VisibilityClientCompanyId2 = '1701' or gs.VisibilityClientCompanyId3 = '1701' )
 +
and gs.Cancelled = 0
 +
 +
 +
Or even better. Instead of looking into both "CreatedOnDate" field and "tblShipmentLog" table, only one criterion could be used, by either
 +
a) tblShipmentLog containing shipment creation record with FieldName='ALL created', or
 +
b) UpdatedOnDate field be maintained, updated on shipment creation and then on each operation that results in population of tblShipmentLog (can be done initially with a trigger).
 +
 +
b)
 +
 +
== SOW 13 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
 +
 +
*'''0003591: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 13)'''
 +
 +
8.
 +
 +
Implementation of viewArchivedShipmentId causes frequent (one a day so far) deadlocks between tblAirShipment and viewArchivedShipmentId and, similarly, between tblOceanShipment and viewArchivedShipmentId.
 +
 +
Noteworthy the deadlock happens when a simple select from viewArchivedShipmentId coincides with a simple single row update of tblAirShipment or tblOceanShipment.
 +
 +
== SOW 14 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
 +
 +
*'''0003592: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 14)'''
 +
 +
9.
 +
 +
Query used to get shipment list:
 +
 +
# Time: 120207 4:16:48
 +
# User@Host: dev[dev] @ localhost [127.0.0.1]
 +
# Query_time: 7.237233 Lock_time: 0.001029 Rows_sent: 15 Rows_examined: 13423184910908060672
 +
SET timestamp=1328606208;
 +
SELECT distinct gs.`Id` as Id, gs.`TransportModeId` as TransportModeId, gs.`CreatedOnDate` as CreatedOnDate, gs.`AuthorizedMOT` as AuthorizedMOT, gs.`AuthorizedDate` as AuthorizedDate, wl.`Id` as `WatchListId`, wl.`NickName`, wl.`IsInTop5`, wl.`IsInWatchList`, wl.`ListOrder` AS `ListOrder`, ac1.`CompanyName` AS `ShipperCompany`, ac2.`CompanyName` AS `ConsigneeCompany` FROM tblGenericShipment gs inner join (select distinct gs.`Id` from ( select gs.Id AS Id, gs.ClientCompanyId AS ClientCompanyId, gs.`TransportModeId` AS `TransportModeId`, gs.`ShipperAddressId`, gs.`ConsigneeAddressId`, originCnt.`RegionId` AS `OriginPointRegionId`, destCnt.`RegionId` AS `DestinationPointRegionId`, ccgr1.ClientCompanyGroupId AS ClientCompanyGroupId, sccr1.ClientCompanyId AS VisibilityClientCompanyIdA, ccgr2.ClientCompanyGroupId AS VisibilityClientCompanyGroupIdA, ccgr3.ClientCompanyId AS VisibilityClientCompanyIdB, sccgr1.ClientCompanyGroupId AS VisibilityClientCompanyGroupIdB from tblGenericShipment gs left join tblClientCompanyGroupRelation ccgr1 on gs.ClientCompanyId=ccgr1.ClientCompanyId left join tblShipment2ClientCompanyRelation sccr1 on gs.Id=sccr1.ShipmentId left join `tblCountry` originCnt on originCnt.`Id` = gs.`OriginPointCountryId` left join `tblCountry` destCnt on destCnt.`Id` = gs.`DestinationPointCountryId` left join tblShipment2ClientCompanyRelation sccr2 join tblClientCompanyGroupRelation ccgr2 on sccr2.ClientCompanyId=ccgr2.ClientCompanyId on gs.Id=sccr2.ShipmentId left join tblShipment2ClientCompanyGroupRelation sccgr1 join tblClientCompanyGroupRelation ccgr3 on sccgr1.ClientCompanyGroupId=ccgr3.ClientCompanyGroupId on gs.Id=sccgr1.ShipmentId where gs.`IsDeleted` = 0 AND gs.`Cancelled` = 0 ) gs where ((((gs.ClientCompanyId='1' OR gs.VisibilityClientCompanyIdA='1' OR gs.VisibilityClientCompanyIdB='1')))) OR ((((gs.ClientCompanyId='87' OR gs.VisibilityClientCompanyIdA='87' OR gs.VisibilityClientCompanyIdB='87')))) ) ids on gs.`Id` = ids.Id left join `tblAddressTransportation` `a1` left join `tblAddressCompanyCity` `ac1` on `a1`.`CompanyCityId` = `ac1`.`Id` on gs.`ShipperAddressId` = `a1`.`Id` left join `tblAddressTransportation` `a2` left join `tblAddressCompanyCity` `ac2` on `a2`.`CompanyCityId` = `ac2`.`Id` on gs.`ConsigneeAddressId` = `a2`.`Id` join tblClientUser cu on cu.`Id` = 383 left join `tblClientUserWatchList` wl on wl.`ShipmentId` = ids.`Id` and wl.`UserId` = cu.`GenericUserId` left join `tblShipmentAuthorizationStatusHistory` sash on sash.id=gs.`AuthorizationStatusHistoryId` WHERE (gs.`IsApprovedByPlanner` is null OR gs.`IsApprovedByPlanner` ) AND gs.`Cancelled` = 0 AND gs.Id IN ( SELECT gs.Id FROM tblGenericShipment gs INNER JOIN tblAirShipment ais ON gs.Id = ais.GenericShipmentId WHERE gs.TransportModeId = 6 AND ais.ImportActualDeliveryDate = 0000-00-00 UNION ALL SELECT gs.Id FROM tblGenericShipment gs INNER JOIN tblTruckingShipment ts ON gs.Id = ts.GenericShipmentId WHERE ts.ActualDeliveryDate = 0000-00-00 AND (gs.TransportModeId in (7, 9)) /*truck-air,domestic: 7, 9*/ UNION ALL SELECT gs.Id FROM tblGenericShipment gs WHERE (gs.TransportModeId = 8) AND /*truck-ocean: 8*/ (NOT EXISTS (SELECT 1 FROM `tblContainer` cont WHERE cont.ShipmentId = gs.Id) OR EXISTS ( SELECT 1 FROM tblContainer cont WHERE cont.ShipmentId = gs.Id AND cont.ContainerActualDelivery = 0000-00-00 )) UNION ALL SELECT gs.Id FROM tblGenericShipment gs INNER JOIN tblOceanShipment os ON gs.Id = os.GenericShipmentId WHERE os.ActualDeliveryDate = 0000-00-00 AND (gs.TransportModeId in (4, 5)) /*ocean LCL: 4, 5*/ UNION ALL SELECT gs.Id FROM tblGenericShipment gs WHERE (gs.TransportModeId in (2, 3)) AND /*ocean FCL: 2, 3*/ ( (gs.`MasterId` is null AND (NOT EXISTS ( SELECT 1 FROM `tblContainer` cont WHERE cont.ShipmentId = gs.Id) OR EXISTS ( SELECT 1 FROM tblContainer cont WHERE cont.ShipmentId = gs.Id AND cont.ContainerActualDelivery = 0000-00-00 )) ) OR (gs.`MasterId` is not null AND (NOT EXISTS ( SELECT 1 FROM `tblContainer` cont WHERE cont.`MasterId` = gs.`MasterId`) OR EXISTS ( SELECT 1 FROM tblContainer cont WHERE cont.MasterId = gs.MasterId AND cont.ContainerActualDelivery = 0000-00-00 )) ) ) ) AND gs.TransportModeId IN (1, 2, 3, 4, 5, 9, 8, 7) ORDER BY Id DESC LIMIT 15, 15;
 +
 +
There are two parts.
 +
1. This particular customer has few hundred shipments visible. To get complete, ordered list of visible shipments it should be possible to do it in a fraction of a second.
 +
2. Even if the above, part 1 takes too long to develop, following approach could significantly improve user experience:
 +
a) Load complete list of shipment IDs into memory.
 +
b) Serve paging requests using cached IDs.
 +
c) Refresh list of shipment IDs if cache is more than 2 minutes old.
 +
 +
== SOW 15 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
 +
 +
*'''0003593: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 15)'''
 +
 +
10.
 +
 +
After I click a shipment from the shipment list I see following query being executed:
 +
 +
# Time: 120207 4:16:54
 +
# User@Host: dev[dev] @ localhost [127.0.0.1]
 +
# Query_time: 6.039052 Lock_time: 0.000174 Rows_sent: 1 Rows_examined: 1205866
 +
SET timestamp=1328606214;
 +
SELECT gs.`Id` as `Id`, wl.`NickName` as `NickName`, gs.`IsDeleted` as `IsDeleted`, gs.`Cancelled` as `Cancelled` FROM `tblGenericShipment` gs inner join (select distinct gs.`Id` from ( select gs.Id AS Id, gs.ClientCompanyId AS ClientCompanyId, gs.`TransportModeId` AS `TransportModeId`, gs.`ShipperAddressId`, gs.`ConsigneeAddressId`, originCnt.`RegionId` AS `OriginPointRegionId`, destCnt.`RegionId` AS `DestinationPointRegionId`, ccgr1.ClientCompanyGroupId AS ClientCompanyGroupId, sccr1.ClientCompanyId AS VisibilityClientCompanyIdA, ccgr2.ClientCompanyGroupId AS VisibilityClientCompanyGroupIdA, ccgr3.ClientCompanyId AS VisibilityClientCompanyIdB, sccgr1.ClientCompanyGroupId AS VisibilityClientCompanyGroupIdB from tblGenericShipment gs left join tblClientCompanyGroupRelation ccgr1 on gs.ClientCompanyId=ccgr1.ClientCompanyId left join tblShipment2ClientCompanyRelation sccr1 on gs.Id=sccr1.ShipmentId left join `tblCountry` originCnt on originCnt.`Id` = gs.`OriginPointCountryId` left join `tblCountry` destCnt on destCnt.`Id` = gs.`DestinationPointCountryId` left join tblShipment2ClientCompanyRelation sccr2 join tblClientCompanyGroupRelation ccgr2 on sccr2.ClientCompanyId=ccgr2.ClientCompanyId on gs.Id=sccr2.ShipmentId left join tblShipment2ClientCompanyGroupRelation sccgr1 join tblClientCompanyGroupRelation ccgr3 on sccgr1.ClientCompanyGroupId=ccgr3.ClientCompanyGroupId on gs.Id=sccgr1.ShipmentId where gs.`IsDeleted` = 0 AND gs.`Cancelled` = 0 ) gs where ((((gs.ClientCompanyId='1' OR gs.VisibilityClientCompanyIdA='1' OR gs.VisibilityClientCompanyIdB='1')))) OR ((((gs.ClientCompanyId='87' OR gs.VisibilityClientCompanyIdA='87' OR gs.VisibilityClientCompanyIdB='87')))) AND gs.Id=374760 ) ids on ids.`Id` = gs.`Id` inner join `tblClientUser` cu left join `tblClientUserWatchList` wl on wl.`ShipmentId` = gs.`Id` and wl.`UserId` = cu.`GenericUserId` WHERE cu.Id= 383 LIMIT 1;
 +
 +
Here, to show a shipment visibility rules are being checked.
 +
In many cases it's redundant. App could cache shipment IDs as proposed in issue #9. And bypass checking visibility rules in cases where shipment was not entered explicitly in CT field, but clicked in the shipment list.
 +
 +
== SOW 16 ==
 +
 +
'''Parent Mantis''': 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.
 +
 +
*'''0003594: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 16)'''
 +
 +
11.
 +
 +
Found following statement executing:
 +
 +
SELECT gu.Id, gu.FirstName, gu.LastName, gu.Birthday, gu.Email, gu.PhoneNumber FROM tblGenericUser AS gu, tblCommentsMailedTo AS cm WHERE gu.Id = cm.UserId AND cm.CommentId = '230101'
 +
 +
Two issues
 +
1) minor, presumably fixed in DEV. CommentId which is integer is compared to character literal.
 +
2) tblCommentsMailedTo has 135K rows. CommentId selectivity is about 60K. Yet there is no index on CommentId.

Current revision as of 14:25, 30 May 2012


Contents

[edit] Info

  • parent mantis: 0000028: [Optimization] .... parent
  • mantis category: Optimization

[edit] Requirements

Keep system speed/performance at the normal levels NOW and IN THE FUTURE.

It is assumed that to satisfy above we have to:

  • Optimize existing code, DB, etc to
  • Develop strategy for the future development

[Perry: 2/22/2012 comments/ideas regarding approach and strategy]

  • Separate web/app server from database server.
  • Introduce scheduled batch reporting features for processor/memory intensive reports
  • Develop plan to proactively monitor server utilization (Evaluate available tools - I know BMC Software has tools to report on various data points across a specified period of time)
  • Look for MySQL and Tomcat tools to monitor SQL statements and long running processes. Evaluate database explain plans - optimize SQL statements and introduce indexes where necessary.

[edit] SOW 0

  • 0001357: (Misc) Create CT2 2015 DB (Ops/Acc modules) Test speed, optimize (requirements for how fast each CT2 table will grow attached)
	For every table:
a) record how many records are now
b) find out (interview BAs) - how many records will be there at the end of 2010
c) estimate rate of growth per year after 2010
d) post this info into http://ct.jaguarfreight.com/wiki/Intro_into_CT2_DB [^] [^]

Based on the info above CT2 Architect will:

a) create database similar in size to dec 2010 levels
b) estimate speed of application
c) compare above with target
d) optimize DB and algorithms to reach the target speed 

[edit] SOW 1

* 0002233: (CT2 Misc) Ongoing DB optimization (SOW 1)

[edit] Implementation Notes SOW 1

  • P/l for mult cts report refactoring
  • Billing required report refactoring
  • P/l report for CT group and for Master - lines from one PI for different cts was grouped into one line per Marc's request

[edit] SOW 2

  • 0003312: (CT2 Misc) Ongoing DB optimization (SOW 2)

[edit] Implementation Notes SOW 2


1. Refactored ChargeCodes to ShipmentInvoices relation table (direct access without using tblTableOfCharges table).
2. Reworked ChargeCode persistence methods to speed-up Invoices access. Was improved speed of reports generation 

report p&l for multiple cts "Special View: Charge Code Group" optimized.
this report now works 10-20 times faster

[edit] SOW 3

[edit] Implementation Notes SOW 3

  • Refactor IsBasicApproval field (Boolean) to PlannerType field. Will simplify Queries logic.
  • Migrate current Planner Shipment Authorization Status fields to tblGenericShipment table
    • Avoid joining pair of long tables tblGenericShipment and tblShipmentAuthorizationStatusHistory
    • improve performance of Planner/Shipper portal and Reporting
  • Refactored Query class
    • removed quotation marks of Integer Parameter in Queries composition. As Vlad noted, comparing Integer to Character values has significant negative effect on performance.


[edit] SOW 4

Kostya: I need to add indexes and constraints to PDF dosc tables - for individual cts and masters. To implement this I should drop some "garbage" records from these several tables so I need time to review existing records and remove all that not have links to shipments that correnctly database has. Only after this I can add needed constraints to the tables. This indexes and constraints will improve database integrity and force speed.

[edit] SOW 5 Re-architect EDI interface

  • 0003523: Re-architect the interface: connect problem between CT2 & FTP Server for TMS, Air Status & EUAN BP.(related: 0003479)

Main Objective: Prevent a single FTP thread from indefinitely blocking other FTP threads due to any error condition (either connection problem, transmission problem, etc..). This applies to all FTP interfaces including TMS, Trendset, etc..

Proposed Solution:

  • Add timeout to all FTP processes to return control back to the application after a predetermined timeout value. Timeout value should not be hard-coded, but either config files or database driven.
  • Modify existing FTP processes that delete and then copy a file from/to the same server to issue rename/move commands instead. Waste a lot of bandwidth by deleting and then copying the same file over the same server.
  • FTP process currently attempts to create directory to store invalid files for each session. Modify process to only issue this command after initial environment build. After an environment is built, then do not issue this command for every FTP session.
  • (Nice-to-have and will not need to implement in this SOW) Modify process to not open/close FTP connection for each and every file. Keep an FTP session open for a set time period to process all requests for that cycle, and then close the connection.

[edit] SOW 6

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003584: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 6)

1.

blPDFReports is missing index on ShipmentId This causes occasional full rescan of a huge table

[edit] SOW 7

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003585: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 7)

2.

  1. Query_time: 6 Lock_time: 0 Rows_sent: 0 Rows_examined: 2075813

INSERT INTO tblTempShipment0 (Id,ShipmentGroupId) (SELECT gs.Id, gs.ShipmentGroupId FROM viewShipment gs LEFT JOIN tblShipmentGroup sg ON gs.ShipmentGroupId=sg.Id WHERE gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND gs.IsLossAuthorized=0 GROUP BY gs.Id)

The statement depends on more than 90% of shipments in the database. Even there are about 90K shipments in the database, the statement touches about 2M.

As database grows the statement will take more and more to execute.

[edit] SOW 8

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003586: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 8)

3.

  1. Query_time: 6 Lock_time: 0 Rows_sent: 1 Rows_examined: 4465332

SELECT COUNT(gs.Id) AS C FROM viewShipment gs WHERE (gs.TransportModeId BETWEEN 2 AND 9) AND gs.ActualPickupDate='0000-00-00' AND ((NOT gs.EstimatedPickupDate='0000-00-00' AND gs.EstimatedPickupDate<'2012-01-23') OR (gs.EstimatedPickupDate BETWEEN '2012-01-23' AND '2012-01-23' + INTERVAL 7 DAY)) AND gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND ((gs.TransportModeId='2' OR gs.TransportModeId ='3' OR gs.TransportModeId='4' OR gs.TransportModeId='5' OR gs.TransportModeId='6' OR gs.TransportModeId='8' OR gs.TransportModeId='9')) AND gs.Cancelled=0;

This query attempts to limit amount of data using 2012-01-23 cut off. However design of viewShipment prohibits that. Query touches 4.5M rows. The query will get slower as shipments are added to the database.

[edit] SOW 9

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003587: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 9)

4.

Design of viewShipment

It is complex, it seems that mySQL optimizer is often unable to understand nature of queries ran against this view. Then it falls back to conservative execution plan. Which causes all shipments to be touched and all extension tables. Which results in millions rows being accessed.

Perhaps more narrow view design (less joins and some unions)less problematic for the optimizer.

[edit] SOW 10

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003588: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 10)

5.

Design of viewShipment

Various timestamps in the view are calculated in a very complex manner ("case" expressions with functions in them). These timestamps could be very useful in limiting scope of the query (selecting only recent shipments, etc). However, even industry leading databases are generally unable deal with such cryptic column definitions. These columns needs to be indexed in such a way, that MySQL optimizer is able to make use of them.

Another way to look at this: If we look at a single shipment we either have complexity of maintaining explicit timestamps on each update of the shipment, but have less complex reports. Or simplify updates, but deal with complexity in reports. Which looks as if there was parity. There isn't in my view. Once shipment is completed it adds to complexity indefinitely into the future. We have much more completed shipments, than those in progress. Complexity should be where volume is lower - shipments updates.

[edit] SOW 11

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003589: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 11)

6. Design of viewShipment

Driving table in the view is tblGenericShipment. It's preferable that other tables were joined to the driving table on their primary keys. Currently it's not the case. Very important tables tblOceanShipment, tblAirShipment, tblTruckingShipment all have duplicates(up to 6 fold) in them. This alone requires "distinct" added to some queries, which makes them slower.

When optimizer doesn't see that join condition necessarily returns at most one dependent row for one driving table row, it can choose suboptimal execution plan.

Besides tblOceanShipment, tblAirShipment, tblTruckingShipment following tables also need to change their primary key: tblShipment2ClientCompanyGroupRelation, tblShipment2ClientCompanyRelation

Possibly there are more such tables.

[edit] SOW 12

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003590: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 12)

7.

  1. Time: 120129 14:01:16
  2. User@Host: root[root] @ [127.0.0.1]
  3. Query_time: 62 Lock_time: 0 Rows_sent: 1 Rows_examined: 85612

SELECT COUNT(gs.Id) AS C FROM viewShipment gs WHERE (gs.CreatedOnDate>'2012-01-28 23:01:04' OR gs.Id IN (SELECT ShipmentId FROM tblShipmentLog WHERE FieldName='ClientCompany' AND ChangeTime>'2012-01-28 23:01:04')) AND gs.IsDeleted=0 AND (gs.IsApprovedByPlanner IS NULL OR gs.IsApprovedByPlanner=1) AND (((gs.ClientCompanyId='1701' OR gs.VisibilityClientCompanyId1='1701' OR gs.VisibilityClientCompanyId2='1701' OR gs.VisibilityClientCompanyId3='1701'))) AND gs.Cancelled=0;

This can be rewritten to use "union" instead of "or". That gives substantial speedup:

select count(distinct C) as C
from (
select gs.Id as C
from viewShipment gs
inner join tblShipmentLog sl on ( gs.Id=sl.ShipmentId )
where sl.FieldName = 'ClientCompany'
and sl.ChangeTime > '2012-01-28 23:01:04'
and gs.IsDeleted = 0
and( gs.IsApprovedByPlanner is null
or gs.IsApprovedByPlanner = 1 )
and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701'
or gs.VisibilityClientCompanyId2 = '1701' or gs.VisibilityClientCompanyId3 = '1701' )
and gs.Cancelled = 0
union all
select gs.Id as C
from viewShipment gs
where gs.CreatedOnDate > '2012-01-28 23:01:04'
and gs.IsDeleted = 0
and( gs.IsApprovedByPlanner is null
or gs.IsApprovedByPlanner = 1 )
and( gs.ClientCompanyId = '1701' or gs.VisibilityClientCompanyId1 = '1701'
or gs.VisibilityClientCompanyId2 = '1701' or gs.VisibilityClientCompanyId3 = '1701' )
and gs.Cancelled = 0


Or even better. Instead of looking into both "CreatedOnDate" field and "tblShipmentLog" table, only one criterion could be used, by either a) tblShipmentLog containing shipment creation record with FieldName='ALL created', or b) UpdatedOnDate field be maintained, updated on shipment creation and then on each operation that results in population of tblShipmentLog (can be done initially with a trigger).

b)

[edit] SOW 13

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003591: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 13)

8.

Implementation of viewArchivedShipmentId causes frequent (one a day so far) deadlocks between tblAirShipment and viewArchivedShipmentId and, similarly, between tblOceanShipment and viewArchivedShipmentId.

Noteworthy the deadlock happens when a simple select from viewArchivedShipmentId coincides with a simple single row update of tblAirShipment or tblOceanShipment.

[edit] SOW 14

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003592: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 14)

9.

Query used to get shipment list:

  1. Time: 120207 4:16:48
  2. User@Host: dev[dev] @ localhost [127.0.0.1]
  3. Query_time: 7.237233 Lock_time: 0.001029 Rows_sent: 15 Rows_examined: 13423184910908060672

SET timestamp=1328606208; SELECT distinct gs.`Id` as Id, gs.`TransportModeId` as TransportModeId, gs.`CreatedOnDate` as CreatedOnDate, gs.`AuthorizedMOT` as AuthorizedMOT, gs.`AuthorizedDate` as AuthorizedDate, wl.`Id` as `WatchListId`, wl.`NickName`, wl.`IsInTop5`, wl.`IsInWatchList`, wl.`ListOrder` AS `ListOrder`, ac1.`CompanyName` AS `ShipperCompany`, ac2.`CompanyName` AS `ConsigneeCompany` FROM tblGenericShipment gs inner join (select distinct gs.`Id` from ( select gs.Id AS Id, gs.ClientCompanyId AS ClientCompanyId, gs.`TransportModeId` AS `TransportModeId`, gs.`ShipperAddressId`, gs.`ConsigneeAddressId`, originCnt.`RegionId` AS `OriginPointRegionId`, destCnt.`RegionId` AS `DestinationPointRegionId`, ccgr1.ClientCompanyGroupId AS ClientCompanyGroupId, sccr1.ClientCompanyId AS VisibilityClientCompanyIdA, ccgr2.ClientCompanyGroupId AS VisibilityClientCompanyGroupIdA, ccgr3.ClientCompanyId AS VisibilityClientCompanyIdB, sccgr1.ClientCompanyGroupId AS VisibilityClientCompanyGroupIdB from tblGenericShipment gs left join tblClientCompanyGroupRelation ccgr1 on gs.ClientCompanyId=ccgr1.ClientCompanyId left join tblShipment2ClientCompanyRelation sccr1 on gs.Id=sccr1.ShipmentId left join `tblCountry` originCnt on originCnt.`Id` = gs.`OriginPointCountryId` left join `tblCountry` destCnt on destCnt.`Id` = gs.`DestinationPointCountryId` left join tblShipment2ClientCompanyRelation sccr2 join tblClientCompanyGroupRelation ccgr2 on sccr2.ClientCompanyId=ccgr2.ClientCompanyId on gs.Id=sccr2.ShipmentId left join tblShipment2ClientCompanyGroupRelation sccgr1 join tblClientCompanyGroupRelation ccgr3 on sccgr1.ClientCompanyGroupId=ccgr3.ClientCompanyGroupId on gs.Id=sccgr1.ShipmentId where gs.`IsDeleted` = 0 AND gs.`Cancelled` = 0 ) gs where ((((gs.ClientCompanyId='1' OR gs.VisibilityClientCompanyIdA='1' OR gs.VisibilityClientCompanyIdB='1')))) OR ((((gs.ClientCompanyId='87' OR gs.VisibilityClientCompanyIdA='87' OR gs.VisibilityClientCompanyIdB='87')))) ) ids on gs.`Id` = ids.Id left join `tblAddressTransportation` `a1` left join `tblAddressCompanyCity` `ac1` on `a1`.`CompanyCityId` = `ac1`.`Id` on gs.`ShipperAddressId` = `a1`.`Id` left join `tblAddressTransportation` `a2` left join `tblAddressCompanyCity` `ac2` on `a2`.`CompanyCityId` = `ac2`.`Id` on gs.`ConsigneeAddressId` = `a2`.`Id` join tblClientUser cu on cu.`Id` = 383 left join `tblClientUserWatchList` wl on wl.`ShipmentId` = ids.`Id` and wl.`UserId` = cu.`GenericUserId` left join `tblShipmentAuthorizationStatusHistory` sash on sash.id=gs.`AuthorizationStatusHistoryId` WHERE (gs.`IsApprovedByPlanner` is null OR gs.`IsApprovedByPlanner` ) AND gs.`Cancelled` = 0 AND gs.Id IN ( SELECT gs.Id FROM tblGenericShipment gs INNER JOIN tblAirShipment ais ON gs.Id = ais.GenericShipmentId WHERE gs.TransportModeId = 6 AND ais.ImportActualDeliveryDate = 0000-00-00 UNION ALL SELECT gs.Id FROM tblGenericShipment gs INNER JOIN tblTruckingShipment ts ON gs.Id = ts.GenericShipmentId WHERE ts.ActualDeliveryDate = 0000-00-00 AND (gs.TransportModeId in (7, 9)) /*truck-air,domestic: 7, 9*/ UNION ALL SELECT gs.Id FROM tblGenericShipment gs WHERE (gs.TransportModeId = 8) AND /*truck-ocean: 8*/ (NOT EXISTS (SELECT 1 FROM `tblContainer` cont WHERE cont.ShipmentId = gs.Id) OR EXISTS ( SELECT 1 FROM tblContainer cont WHERE cont.ShipmentId = gs.Id AND cont.ContainerActualDelivery = 0000-00-00 )) UNION ALL SELECT gs.Id FROM tblGenericShipment gs INNER JOIN tblOceanShipment os ON gs.Id = os.GenericShipmentId WHERE os.ActualDeliveryDate = 0000-00-00 AND (gs.TransportModeId in (4, 5)) /*ocean LCL: 4, 5*/ UNION ALL SELECT gs.Id FROM tblGenericShipment gs WHERE (gs.TransportModeId in (2, 3)) AND /*ocean FCL: 2, 3*/ ( (gs.`MasterId` is null AND (NOT EXISTS ( SELECT 1 FROM `tblContainer` cont WHERE cont.ShipmentId = gs.Id) OR EXISTS ( SELECT 1 FROM tblContainer cont WHERE cont.ShipmentId = gs.Id AND cont.ContainerActualDelivery = 0000-00-00 )) ) OR (gs.`MasterId` is not null AND (NOT EXISTS ( SELECT 1 FROM `tblContainer` cont WHERE cont.`MasterId` = gs.`MasterId`) OR EXISTS ( SELECT 1 FROM tblContainer cont WHERE cont.MasterId = gs.MasterId AND cont.ContainerActualDelivery = 0000-00-00 )) ) ) ) AND gs.TransportModeId IN (1, 2, 3, 4, 5, 9, 8, 7) ORDER BY Id DESC LIMIT 15, 15;

There are two parts. 1. This particular customer has few hundred shipments visible. To get complete, ordered list of visible shipments it should be possible to do it in a fraction of a second. 2. Even if the above, part 1 takes too long to develop, following approach could significantly improve user experience: a) Load complete list of shipment IDs into memory. b) Serve paging requests using cached IDs. c) Refresh list of shipment IDs if cache is more than 2 minutes old.

[edit] SOW 15

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003593: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 15)

10.

After I click a shipment from the shipment list I see following query being executed:

  1. Time: 120207 4:16:54
  2. User@Host: dev[dev] @ localhost [127.0.0.1]
  3. Query_time: 6.039052 Lock_time: 0.000174 Rows_sent: 1 Rows_examined: 1205866

SET timestamp=1328606214; SELECT gs.`Id` as `Id`, wl.`NickName` as `NickName`, gs.`IsDeleted` as `IsDeleted`, gs.`Cancelled` as `Cancelled` FROM `tblGenericShipment` gs inner join (select distinct gs.`Id` from ( select gs.Id AS Id, gs.ClientCompanyId AS ClientCompanyId, gs.`TransportModeId` AS `TransportModeId`, gs.`ShipperAddressId`, gs.`ConsigneeAddressId`, originCnt.`RegionId` AS `OriginPointRegionId`, destCnt.`RegionId` AS `DestinationPointRegionId`, ccgr1.ClientCompanyGroupId AS ClientCompanyGroupId, sccr1.ClientCompanyId AS VisibilityClientCompanyIdA, ccgr2.ClientCompanyGroupId AS VisibilityClientCompanyGroupIdA, ccgr3.ClientCompanyId AS VisibilityClientCompanyIdB, sccgr1.ClientCompanyGroupId AS VisibilityClientCompanyGroupIdB from tblGenericShipment gs left join tblClientCompanyGroupRelation ccgr1 on gs.ClientCompanyId=ccgr1.ClientCompanyId left join tblShipment2ClientCompanyRelation sccr1 on gs.Id=sccr1.ShipmentId left join `tblCountry` originCnt on originCnt.`Id` = gs.`OriginPointCountryId` left join `tblCountry` destCnt on destCnt.`Id` = gs.`DestinationPointCountryId` left join tblShipment2ClientCompanyRelation sccr2 join tblClientCompanyGroupRelation ccgr2 on sccr2.ClientCompanyId=ccgr2.ClientCompanyId on gs.Id=sccr2.ShipmentId left join tblShipment2ClientCompanyGroupRelation sccgr1 join tblClientCompanyGroupRelation ccgr3 on sccgr1.ClientCompanyGroupId=ccgr3.ClientCompanyGroupId on gs.Id=sccgr1.ShipmentId where gs.`IsDeleted` = 0 AND gs.`Cancelled` = 0 ) gs where ((((gs.ClientCompanyId='1' OR gs.VisibilityClientCompanyIdA='1' OR gs.VisibilityClientCompanyIdB='1')))) OR ((((gs.ClientCompanyId='87' OR gs.VisibilityClientCompanyIdA='87' OR gs.VisibilityClientCompanyIdB='87')))) AND gs.Id=374760 ) ids on ids.`Id` = gs.`Id` inner join `tblClientUser` cu left join `tblClientUserWatchList` wl on wl.`ShipmentId` = gs.`Id` and wl.`UserId` = cu.`GenericUserId` WHERE cu.Id= 383 LIMIT 1;

Here, to show a shipment visibility rules are being checked. In many cases it's redundant. App could cache shipment IDs as proposed in issue #9. And bypass checking visibility rules in cases where shipment was not entered explicitly in CT field, but clicked in the shipment list.

[edit] SOW 16

Parent Mantis: 003398: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages.

  • 0003594: CT slow and processing; then randomly users are receiving Java.SQL Debug Error Messages (SOW 16)

11.

Found following statement executing:

SELECT gu.Id, gu.FirstName, gu.LastName, gu.Birthday, gu.Email, gu.PhoneNumber FROM tblGenericUser AS gu, tblCommentsMailedTo AS cm WHERE gu.Id = cm.UserId AND cm.CommentId = '230101'

Two issues 1) minor, presumably fixed in DEV. CommentId which is integer is compared to character literal. 2) tblCommentsMailedTo has 135K rows. CommentId selectivity is about 60K. Yet there is no index on CommentId.

Personal tools