Skip to content

Commit 07d5117

Browse files
committed
Add to TODO item about raw device performance.
1 parent 48e6cfc commit 07d5117

File tree

1 file changed

+113
-2
lines changed

1 file changed

+113
-2
lines changed

doc/TODO.detail/performance

Lines changed: 113 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -345,7 +345,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999
345345
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
346346
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087
347347
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT)
348-
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
348+
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.9 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
349349
Received: from localhost (majordom@localhost)
350350
by hub.org (8.9.3/8.9.3) with SMTP id KAA30328;
351351
Tue, 19 Oct 1999 10:12:10 -0400 (EDT)
@@ -454,7 +454,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999
454454
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
455455
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130
456456
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT)
457-
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.8 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
457+
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.9 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
458458
Received: from localhost (majordom@localhost)
459459
by hub.org (8.9.3/8.9.3) with SMTP id VAA50745;
460460
Tue, 19 Oct 1999 21:07:23 -0400 (EDT)
@@ -1002,3 +1002,114 @@ Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
10021002
phone: +007(095)939-16-83, +007(095)939-23-83
10031003

10041004

1005+
From pgsql-general-owner+M2497@hub.org Fri Jun 16 18:31:03 2000
1006+
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
1007+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04165
1008+
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:31:01 -0400 (EDT)
1009+
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.9 $) with ESMTP id RAA13110 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:20:12 -0400 (EDT)
1010+
Received: from hub.org (majordom@localhost [127.0.0.1])
1011+
by hub.org (8.10.1/8.10.1) with SMTP id e5GLDaM14477;
1012+
Fri, 16 Jun 2000 17:13:36 -0400 (EDT)
1013+
Received: from home.dialix.com ([203.15.150.26])
1014+
by hub.org (8.10.1/8.10.1) with ESMTP id e5GLCQM14064
1015+
for <pgsql-general@postgresql.org>; Fri, 16 Jun 2000 17:12:27 -0400 (EDT)
1016+
Received: from nemeton.com.au ([202.76.153.71])
1017+
by home.dialix.com (8.9.3/8.9.3/JustNet) with SMTP id HAA95516
1018+
for <pgsql-general@postgresql.org>; Sat, 17 Jun 2000 07:11:44 +1000 (EST)
1019+
(envelope-from giles@nemeton.com.au)
1020+
Received: (qmail 10213 invoked from network); 16 Jun 2000 09:52:29 -0000
1021+
Received: from nemeton.com.au (203.8.3.17)
1022+
by nemeton.com.au with SMTP; 16 Jun 2000 09:52:29 -0000
1023+
To: Jurgen Defurne <defurnj@glo.be>
1024+
cc: Mark Stier <kalium@gmx.de>,
1025+
postgreSQL general mailing list <pgsql-general@postgresql.org>
1026+
Subject: Re: [GENERAL] optimization by removing the file system layer?
1027+
In-Reply-To: Message from Jurgen Defurne <defurnj@glo.be>
1028+
of "Thu, 15 Jun 2000 20:26:57 +0200." <39491FF1.E1E583F8@glo.be>
1029+
Date: Fri, 16 Jun 2000 19:52:28 +1000
1030+
Message-ID: <10210.961149148@nemeton.com.au>
1031+
From: Giles Lean <giles@nemeton.com.au>
1032+
X-Mailing-List: pgsql-general@postgresql.org
1033+
Precedence: bulk
1034+
Sender: pgsql-general-owner@hub.org
1035+
Status: OR
1036+
1037+
1038+
1039+
> I think that the Un*x filesystem is one of the reasons that large
1040+
> database vendors rather use raw devices, than filesystem storage
1041+
> files.
1042+
1043+
This used to be the preference, back in the late 80s and possibly
1044+
early 90s. I'm seeing a preference toward using the filesystem now,
1045+
possibly with some sort of async I/O and co-operation from the OS
1046+
filesystem about interactions with the filesystem cache.
1047+
1048+
Performance preferences don't stand still. The hardware changes, the
1049+
software changes, the volume of data changes, and different solutions
1050+
become preferable.
1051+
1052+
> Using a raw device on the disk gives them the possibility to have
1053+
> complete control over their files, indices and objects without being
1054+
> bothered by the operating system.
1055+
>
1056+
> This speeds up things in several ways :
1057+
> - the least possible OS intervention
1058+
1059+
Not that this is especially useful, necessarily. If the "raw" device
1060+
is in fact managed by a logical volume manager doing mirroring onto
1061+
some sort of storage array there is still plenty of OS code involved.
1062+
1063+
The cost of using a filesystem in addition may not be much if anything
1064+
and of course a filesystem is considerably more flexible to
1065+
administer (backup, move, change size, check integrity, etc.)
1066+
1067+
> - choose block sizes according to applications
1068+
> - reducing fragmentation
1069+
> - packing data in nearby cilinders
1070+
1071+
... but when this storage area is spread over multiple mechanisms in a
1072+
smart storage array with write caching, you've no idea what is where
1073+
anyway. Better to let the hardware or at least the OS manage this;
1074+
there are so many levels of caching between a database and the
1075+
magnetic media that working hard to influence layout is almost
1076+
certainly a waste of time.
1077+
1078+
Kirk McKusick tells a lovely story that once upon a time it used to be
1079+
sensible to check some registers on a particular disk controller to
1080+
find out where the heads were when scheduling I/O. Needless to say,
1081+
that is history now!
1082+
1083+
There's a considerable cost in complexity and code in using "raw"
1084+
storage too, and it's not a one off cost: as the technologies change,
1085+
the "fast" way to do things will change and the code will have to be
1086+
updated to match. Better to leave this to the OS vendor where
1087+
possible, and take advantage of the tuning they do.
1088+
1089+
> - Anyone other ideas -> the sky is the limit here
1090+
1091+
> It also aids portability, at least on platforms that have an
1092+
> equivalent of a raw device.
1093+
1094+
I don't understand that claim. Not much is portable about raw
1095+
devices, and they're typically not nearlly as well documented as the
1096+
filesystem interfaces.
1097+
1098+
> It is also independent of the standard implemented Un*x filesystems,
1099+
> for which you will have to pay extra if you want to take extra
1100+
> measures against power loss.
1101+
1102+
Rather, it is worse. With a Unix filesystem you get quite defined
1103+
semantics about what is written when.
1104+
1105+
> The problem with e.g. e2fs, is that it is not robust enough if a CPU
1106+
> fails.
1107+
1108+
ext2fs doesn't even claim to have Unix filesystem semantics.
1109+
1110+
Regards,
1111+
1112+
Giles
1113+
1114+
1115+

0 commit comments

Comments
 (0)