| CARVIEW |
Gentoo’s dev-lang/ghc-8.2.1_rc1 supports both cross-building and cross-compiling modes! It’s useful for cross-compiling haskell software and initial porting of GHC itself on a new gentoo target.
Building a GHC crossompiler on Gentoo
Getting ${CTARGET}-ghc (crosscompiler) on Gentoo:
# # convenience variables:
CTARGET=powerpc64-unknown-linux-gnu
#
# # Installing a target toolchain: gcc, glibc, binutils
crossdev ${CTARGET}
# # Installing ghc dependencies:
emerge-${CTARGET} -1 libffi ncurses gmp
#
# # adding 'ghc' symlink to cross-overlay:
ln -s path/to/haskell/overlay/dev-lang/ghc part/to/cross/overlay/cross-${CTARGET}/ghc
#
# # Building ghc crosscompiler:
emerge -1 cross-${CTARGET}/ghc
#
powerpc64-unknown-linux-gnu-ghc --info | grep Target
# ,("Target platform","powerpc64-unknown-linux")
Cross-building GHC on Gentoo
Cross-building ghc on ${CTARGET}:
# # convenience variables:
CTARGET=powerpc64-unknown-linux-gnu
#
# # Installing a target toolchain: gcc, glibc, binutils
crossdev ${CTARGET}
# # Installing ghc dependencies:
emerge-${CTARGET} -1 libffi ncurses gmp
#
# # Cross-building ghc crosscompiler:
emerge-${CTARGET} --buildpkg -1 dev-lang/ghc
#
# # Now built packages can be used on a target to install
# # built ghc as: emerge --usepkg -1 dev-lang/ghc
Building a GHC crossompiler (generic)
That’s how you get a powerpc64 crosscompiler in a fresh git checkout:
$ ./configure --target=powerpc64-unknown-linux-gnu
$ cat mk/build.mk
HADDOCK_DOCS=NO
BUILD_SPHINX_HTML=NO
BUILD_SPHINX_PDF=NO
# to speed things up
BUILD_PROF_LIBS=NO
$ make -j$(nproc)
$ inplace/bin/ghc-stage1 --info | grep Target
,("Target platform","powerpc64-unknown-linux")
Simple!
Below are details that have only historical (or backporting) value.
How did we get there?
Cross-compiling support in GHC is not a new thing. GHC wiki has a detailed section on how to build a crosscompiler. That works quite good. You can even target ghc at m68k: porting example.
What did not work so well is the attempt to install the result! In some places GHC build system tried to run ghc-pkg built for ${CBUILD}, in some places for ${CHOST}.
I never really tried to install a crosscompiler before. I think mostly because I was usually happy to make cross-compiler build at all: making GHC build for a rare target usually required a patch or two.
But one day I’ve decided to give full install a run. Original motivation was a bit unusual: I wanted to free space on my hard drive.
The build tree for GHC usually takes about 6-8GB. I had about 15 GHC source trees lying around. All in all it took about 10% of all space on my hard drive. Fixing make install would allow me to install only final result and get rid of all intermediate files.
I’ve decided to test make install code on Gentoo‘s dev-lang/ghc package as a proper package.
As a result a bunch of minor cleanups happened:
- fixed NCG/llvm presence for target
- marked aarch64 as NCG/llvm target
- fixed ${CTARGET}-prefixing for hp2ps
- fixed ${CTARGET}-prefixing for noncanonical targets
- dropped ${CTARGET}-prefixing for stage2 installs (crossbuilds)
- added ${CTARGET}-prefixing for ghci
- fixed stage2 install to run only ${CBUILD} tools
- fixed all stage2 binaries to run on ${CHOST}, not ${CBUILD}
What works?
It allowed me to test various targets. Namely:
| Target | Bits | Endianness | Codegen |
|---|---|---|---|
| cross-aarch64-unknown-linux-gnu/ghc | 64 | LE | LLVM |
| cross-alpha-unknown-linux-gnu/ghc | 64 | LE | UNREG |
| cross-armv7a-unknown-linux-gnueabi/ghc | 32 | LE | LLVM |
| cross-hppa-unknown-linux-gnu/ghc | 32 | BE | UNREG |
| cross-m68k-unknown-linux-gnu/ghc | 32 | BE | UNREG |
| cross-mips64-unknown-linux-gnu/ghc | 32/64 | BE | UNREG |
| cross-powerpc64-unknown-linux-gnu/ghc | 64 | BE | NCG |
| cross-powerpc64le-unknown-linux-gnu/ghc | 64 | LE | NCG |
| cross-s390x-unknown-linux-gnu/ghc | 64 | BE | UNREG |
| cross-sparc-unknown-linux-gnu/ghc | 32 | BE | UNREG |
| cross-sparc64-unknown-linux-gnu/ghc | 64 | BE | UNREG |
I am running all of this on x86_64 (64-bit LE platform)
Quite a list! With help of qemu we can even test whether cross-compiler produces something that works:
$ cat hi.hs
main = print "hello!"
$ powerpc64le-unknown-linux-gnu-ghc hi.hs -o hi.ppc64le
[1 of 1] Compiling Main ( hi.hs, hi.o )
Linking hi.ppc64le ...
$ file hi.ppc64le
hi.ppc64le: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked, interpreter /lib64/ld64.so.2, for GNU/Linux 3.2.0, not stripped
$ qemu-ppc64le -L /usr/powerpc64le-unknown-linux-gnu/ ./hi.ppc64le
"hello!"
Many qemu targets are slightly buggy and usually are very easy to fix!
A few recent examples:
- epoll syscall is not wired properly on qemu-alpha: patch
- CPU initialization code on qemu-s390x
- thread creation fails on qemu-sparc32plus due to simple mmap() emulation bug
- tcg on qemu-sparc64 crashes at runtime in static_code_gen_buffer()
Tweaking qemu is fun 
- use
--autounmask=n - use
--backtrack=1000(or more) package.maskall outdated packages that cause conflicts (usually requires more iterations)- run world update
The problem
Occasionally (more frequently on haskel packages) portage starts taking long time to only tell you it was not able to figure out the upgrade path.
Some people suggest "wipe-blockers-and-reinstall" solution. This post will try to explain how to actually upgrade (or find out why it’s not possible) without actually destroying your system.
Real-world example
I’ll take a real-world example in Gentoo’s bugzilla: bug 579520 where original upgrade error looked like that:
!!! Multiple package instances within a single package slot have been pulled
!!! into the dependency graph, resulting in a slot conflict:
x11-libs/gtk+:3
(x11-libs/gtk+-3.18.7:3/3::gentoo, ebuild scheduled for merge) pulled in by
(no parents that aren't satisfied by other packages in this slot)
(x11-libs/gtk+-3.20.0:3/3::gnome, installed) pulled in by
>=x11-libs/gtk+-3.19.12:3[introspection?] required by (gnome-base/nautilus-3.20.0:0/0::gnome, installed)
^^ ^^^^^^^^^
>=x11-libs/gtk+-3.20.0:3[cups?] required by (gnome-base/gnome-core-libs-3.20.0:3.0/3.0::gnome, installed)
^^ ^^^^^^^^
>=x11-libs/gtk+-3.19.4:3[introspection?] required by (media-video/totem-3.20.0:0/0::gnome, ebuild scheduled for merge)
^^ ^^^^^^^^
>=x11-libs/gtk+-3.19.0:3[introspection?] required by (app-editors/gedit-3.20.0:0/0::gnome, ebuild scheduled for merge)
^^ ^^^^^^^^
>=x11-libs/gtk+-3.19.5:3 required by (gnome-base/dconf-editor-3.20.0:0/0::gnome, ebuild scheduled for merge)
^^ ^^^^^^^^
>=x11-libs/gtk+-3.19.6:3[introspection?] required by (x11-libs/gtksourceview-3.20.0:3.0/3::gnome, ebuild scheduled for merge)
^^ ^^^^^^^^
>=x11-libs/gtk+-3.19.3:3[introspection,X] required by (media-gfx/eog-3.20.0:1/1::gnome, ebuild scheduled for merge)
^^ ^^^^^^^^
>=x11-libs/gtk+-3.19.8:3[X,introspection?] required by (x11-wm/mutter-3.20.0:0/0::gnome, installed)
^^ ^^^^^^^^
>=x11-libs/gtk+-3.19.12:3[X,wayland?] required by (gnome-base/gnome-control-center-3.20.0:2/2::gnome, installed)
^^ ^^^^^^^^^
>=x11-libs/gtk+-3.19.11:3[introspection?] required by (app-text/gspell-1.0.0:0/0::gnome, ebuild scheduled for merge)
^^ ^^^^^^^^^
x11-base/xorg-server:0
(x11-base/xorg-server-1.18.3:0/1.18.3::gentoo, installed) pulled in by
x11-base/xorg-server:0/1.18.3= required by (x11-drivers/xf86-video-nouveau-1.0.12:0/0::gentoo, installed)
^^^^^^^^^^
x11-base/xorg-server:0/1.18.3= required by (x11-drivers/xf86-video-fbdev-0.4.4:0/0::gentoo, installed)
^^^^^^^^^^
x11-base/xorg-server:0/1.18.3= required by (x11-drivers/xf86-input-evdev-2.10.1:0/0::gentoo, installed)
^^^^^^^^^^
(x11-base/xorg-server-1.18.2:0/1.18.2::gentoo, ebuild scheduled for merge) pulled in by
x11-base/xorg-server:0/1.18.2= required by (x11-drivers/xf86-video-vesa-2.3.4:0/0::gentoo, installed)
^^^^^^^^^^
x11-base/xorg-server:0/1.18.2= required by (x11-drivers/xf86-input-synaptics-1.8.2:0/0::gentoo, installed)
^^^^^^^^^^
x11-base/xorg-server:0/1.18.2= required by (x11-drivers/xf86-input-mouse-1.9.1:0/0::gentoo, installed)
^^^^^^^^^^
app-text/poppler:0
(app-text/poppler-0.32.0:0/51::gentoo, ebuild scheduled for merge) pulled in by
>=app-text/poppler-0.32:0/51=[cxx,jpeg,lcms,tiff,xpdf-headers(+)] required by (net-print/cups-filters-1.5.0:0/0::gentoo, installed)
^^^^^^
>=app-text/poppler-0.16:0/51=[cxx] required by (app-office/libreoffice-5.0.5.2:0/0::gentoo, installed)
^^^^^^
>=app-text/poppler-0.12.3-r3:0/51= required by (app-text/texlive-core-2014-r4:0/0::gentoo, installed)
^^^^^^
(app-text/poppler-0.42.0:0/59::gentoo, ebuild scheduled for merge) pulled in by
>=app-text/poppler-0.33[cairo] required by (app-text/evince-3.20.0:0/evd3.4-evv3.3::gnome, ebuild scheduled for merge)
^^ ^^^^
net-fs/samba:0
(net-fs/samba-4.2.9:0/0::gentoo, installed) pulled in by
(no parents that aren't satisfied by other packages in this slot)
(net-fs/samba-3.6.25:0/0::gentoo, ebuild scheduled for merge) pulled in by
net-fs/samba[smbclient] required by (media-sound/xmms2-0.8-r2:0/0::gentoo, ebuild scheduled for merge)
^^^^^^^^^
It may be possible to solve this problem by using package.mask to
prevent one of those packages from being selected. However, it is also
possible that conflicting dependencies exist such that they are
impossible to satisfy simultaneously. If such a conflict exists in
the dependencies of two different packages, then those packages can
not be installed simultaneously.
For more information, see MASKED PACKAGES section in the emerge man
page or refer to the Gentoo Handbook.
emerge: there are no ebuilds to satisfy ">=dev-libs/boost-1.55:0/1.57.0=".
(dependency required by "app-office/libreoffice-5.0.5.2::gentoo" [installed])
(dependency required by "@selected" [set])
(dependency required by "@world" [argument])
A lot of text! Let’s trim it down to essential detail first (AKA how to actually read it). I’ve dropped the "cause" of conflcts from previous listing and left only problematic packages:
!!! Multiple package instances within a single package slot have been pulled
!!! into the dependency graph, resulting in a slot conflict:
x11-libs/gtk+:3
(x11-libs/gtk+-3.18.7:3/3::gentoo, ebuild scheduled for merge) pulled in by
(x11-libs/gtk+-3.20.0:3/3::gnome, installed) pulled in by
x11-base/xorg-server:0
(x11-base/xorg-server-1.18.3:0/1.18.3::gentoo, installed) pulled in by
(x11-base/xorg-server-1.18.2:0/1.18.2::gentoo, ebuild scheduled for merge) pulled in by
app-text/poppler:0
(app-text/poppler-0.32.0:0/51::gentoo, ebuild scheduled for merge) pulled in by
(app-text/poppler-0.42.0:0/59::gentoo, ebuild scheduled for merge) pulled in by
net-fs/samba:0
(net-fs/samba-4.2.9:0/0::gentoo, installed) pulled in by
(net-fs/samba-3.6.25:0/0::gentoo, ebuild scheduled for merge) pulled in by
emerge: there are no ebuilds to satisfy ">=dev-libs/boost-1.55:0/1.57.0=".
That is more manageable. We have 4 "conflicts" here and one "missing" package.
Note: all the listed requirements under "conflicts" (the previous listing) suggest these are >= depends. Thus the dependencies themselves can’t block upgrade path and error message is misleading.
For us it means that portage leaves old versions of gtk+ (and others) for some unknown reason.
To get an idea on how to explore that situation we need to somehow hide outdated packages from portage and retry an update. It will be the same as uninstalling and reinstalling a package but not actually doing it 
package.mask does exactly that. To make package hidden for real we will also need to disable autounmask: --autounmask=n (default is y).
Let’s hide outdated x11-libs/gtk+-3.18.7 package from portage:
# /etc/portage/package.mask
<x11-libs/gtk+-3.20.0:3
Blocker list became shorter but still does not make sense:
x11-base/xorg-server:0
(x11-base/xorg-server-1.18.2:0/1.18.2::gentoo, ebuild scheduled for merge) pulled in by
(x11-base/xorg-server-1.18.3:0/1.18.3::gentoo, installed) pulled in by
^^^^^^^^^^
app-text/poppler:0
(app-text/poppler-0.32.0:0/51::gentoo, ebuild scheduled for merge) pulled in by
(app-text/poppler-0.42.0:0/59::gentoo, ebuild scheduled for merge) pulled in by
Blocking more old stuff:
# /etc/portage/package.mask
<x11-libs/gtk+-3.20.0:3
<x11-base/xorg-server-1.18.3
<app-text/poppler-0.42.0
The output is:
[blocks B ] <dev-util/gdbus-codegen-2.48.0 ("<dev-util/gdbus-codegen-2.48.0" is blocking dev-libs/glib-2.48.0)
* Error: The above package list contains packages which cannot be
* installed at the same time on the same system.
(dev-libs/glib-2.48.0:2/2::gentoo, ebuild scheduled for merge) pulled in by
(dev-util/gdbus-codegen-2.46.2:0/0::gentoo, ebuild scheduled for merge) pulled in by
That’s our blocker! Stable <dev-util/gdbus-codegen-2.48.0 blocks unstable blocking dev-libs/glib-2.48.0.
The solution is to ~arch keyword dev-util/gdbus-codegen-2.48.0:
# /etc/portage/package.accept_keywords
dev-util/gdbus-codegen
And run world update.
Simple!
]]>On rare arches ghc-7.8.3 behaves a bit bad:
- ia64 build stopped being able to link itself after ghc-7.4 (gprel overflow)
- on sparc, ia64 and ppc ghc was not able to create working shared libraries
- integer-gmp library on ia64 crashed, and we had to use integer-simple
I have written a small story of those fixes here if you are curious.
TL;DR:
To get ghc-7.8.3 working nicer for exotic arches you will need to backport at least the following patches:
Thank you!
]]>A bit later I’ve decided to actually look at failure case (Issued on darcs bugtracker) and do something about it. My idea to debug the mystery was simple: to reproduce the difference on the same source for ghc-7.6/7.8 and start plugging debug info unless difference I can understand will pop up.
Darcs has great debug-verbose option for most of commands. I used debugMessage function to litter code with more debugging statements unless complete horrible image would emerge.
As you can see in bugtracker issue I posted there various intermediate points of what I thought went wrong (don’t expect those comments to have much sense).
The immediate consequence of a breakage was file overwrite of partially downloaded file. The event timeline looked simple:
- darcs scheduled for download the same file twice (two jobs in download queue)
- first download job did finish
- notified waiter started processing of that downloaded temp file
- second download started truncating previous complete download
- notified waiter continued processing partially downloadeed file and detected breakage
Thus first I’ve decided to fix the consequence. It did not fix problems completely, sometimes darcs pull complained about remote repositories still being broken (missing files), but it made errors saner (only remote side was allegedly at fault).
Ideally, that file overwrite should not happen in the first place. Partially, it was temp file predictability.
But, OK. Then i’ve started digging why 7.6/7.8 request download patterns were so severely different. At first I thought of new IO manager being a cause of difference. The paper says it fixed haskell thread scheduling issue (paper is nice even for leisure reading!):
GHC’s RTS had a bug in which yield
placed the thread back on the front of the run queue. This bug
was uncovered by our use of yield
which requires that the thread
be placed at the end of the run queue
Thus I was expecting the bug from this side.
Then being determined to dig A Lot in darcs source code I’ve decided to disable optimizations (-O0) to speedup rebuilds. And, the bug has vanished.
That made the click: unsafePerformIO might be the real problem. I’ve grepped for all unsafePerformIO instances and examined all definition sites.
Two were especially interesting:
-- src/Darcs/Util/Global.hs
-- ...
_crcWarningList :: IORef CRCWarningList
_crcWarningList = unsafePerformIO $ newIORef []
{-# NOINLINE _crcWarningList #-}
-- ...
_badSourcesList :: IORef [String]
_badSourcesList = unsafePerformIO $ newIORef []
{- NOINLINE _badSourcesList -}
-- ...
Did you spot the bug?
Thus The Proper Fix was pushed upstream a month ago. Which means ghc is now able to inline things more aggressively (and _badSourcesList were inlined in all user sites, throwing out all update sites).
I don’t know if those newIORef [] can be de-CSEd if types would have the same representation. Ideally the module also needs -fno-cse, or get rid of unsafePerformIO completely :].
(Side thought: top-level global variables in C style are surprisingly non-trivial in "pure" haskell. They are easy to use via peek / poke (in a racy way), but are hard to declare / initialize.)
I had a question wondered how many haskell packages manage to misspell ghc pragma decparations in a way darcs did it. And there still _is_ a few of such offenders:
$ fgrep -R NOINLINE . | grep -v '{-# NOINLINE' | grep '{-'
--
ajhc-0.8.0.10/lib/jhc/Jhc/List.hs:{- NOINLINE filterFB #-}
ajhc-0.8.0.10/lib/jhc/Jhc/List.hs:{- NOINLINE iterateFB #-}
ajhc-0.8.0.10/lib/jhc/Jhc/List.hs:{- NOINLINE mapFB #-}
--
darcs-2.8.4/src/Darcs/Global.hs:{- NOINLINE _badSourcesList -}
darcs-2.8.4/src/Darcs/Global.hs:{- NOINLINE _reachableSourcesList -}
--
dph-lifted-copy-0.7.0.1/Data/Array/Parallel.hs:{- NOINLINE emptyP #-}
--
dph-par-0.5.1.1/Data/Array/Parallel.hs:{- NOINLINE emptyP #-}
--
dph-seq-0.5.1.1/Data/Array/Parallel.hs:{- NOINLINE emptyP #-}
--
freesect-0.8/FreeSectAnnotated.hs:{- # NOINLINE showSSI #-}
freesect-0.8/FreeSectAnnotated.hs:{- # NOINLINE FreeSectAnnotated.showSSI #-}
freesect-0.8/FreeSect.hs:{- # NOINLINE fs_warn_flaw #-}
--
http-proxy-0.0.8/Network/HTTP/Proxy/ReadInt.hs:{- NOINLINE readInt64MH #-}
http-proxy-0.0.8/Network/HTTP/Proxy/ReadInt.hs:{- NOINLINE mhDigitToInt #-}
--
lhc-0.10/lib/base/src/GHC/PArr.hs:{- NOINLINE emptyP #-}
--
property-list-0.1.0.2/src/Data/PropertyList/Binary/Float.hs:{- NOINLINE doubleToWord64 -}
property-list-0.1.0.2/src/Data/PropertyList/Binary/Float.hs:{- NOINLINE word64ToDouble -}
property-list-0.1.0.2/src/Data/PropertyList/Binary/Float.hs:{- NOINLINE floatToWord32 -}
property-list-0.1.0.2/src/Data/PropertyList/Binary/Float.hs:{- NOINLINE word32ToFloat -}
--
warp-2.0.3.3/Network/Wai/Handler/Warp/ReadInt.hs:{- NOINLINE readInt64MH #-}
warp-2.0.3.3/Network/Wai/Handler/Warp/ReadInt.hs:{- NOINLINE mhDigitToInt #-}
Looks like there is yet something to fix :]
Would be great if hlint would be able to detect pragma-like comments and warn when comment contents is a valid pragma, but comment brackets don’t allow it to fire.
{- NOINLINE foo -} -- bad
{- NOINLINE foo #-} -- bad
{-# NOINLINE foo -} -- bad
{-# NOINLINE foo #-} -- ok
Thanks for reading!
]]>As a packager I was especially interested in following features:
- GHCi (and dynamic linking) on unregisterised arches, like ia64 and powerpc64
- jobs argument for ghc make. Parallel builds for free.
- what did seriously break, what was fixed?
First off, -rc1 is packaged in gentoo-haskell overlay (not keyworded as quite a bit of packages fail to build against ghc-7.8).
GHCi (and dynamic linking)
Dynamic linking works like a charm! GHCi loads binaries noticeaby faster. Let’s test it! Simplest synthetic test: how fast do you get prompt from interpreter?
# ghc-7.6:
$ time { echo '1+1' | ghci -package yesod-core >/dev/null; }
real 0m0.626s
user 0m0.550s
sys 0m0.074s
# ghc-7.8:
$ time { echo '1+1' | ghci -package yesod-core >/dev/null; }
real 0m0.209s
user 0m0.172s
sys 0m0.034s
It’s a case, when files are cached in RAM. 3-4 times faster. The same boost should apply every time when you compile something template-haskell related.
jobs argument for ghc make
I’ve went ahead and tried to enable it for all ebuilds.
For some reason ghc eats a lot of system time in that mode. Likely jobs without arguments is not very good idea and i’ll need to limit it by minimum of MAKEOPTS value and some N (Cabal picked 64).
Even in this mode 2-time speedup is visible on large packages.
So what did break?
Not _that_ much, actually.
alex and happy generated parsers
All package maintainers who ship lexers generated by alex and parsers generated by happy are strongly advised to update those tools locally and reissue hackage update, as old parsers do not compile against ghc-7.8.
If you have happened to use low-level
(==#) :: Int# -> Int# -> Bool
primitives, you might need to port your code a bit, as how their type is a bit different:
(==#) :: Int# -> Int# -> Int#
Here is our example fix for arithmoi.
Type inference changed a bit.
Traditionally darcs needed a patch :] In that big mostly dumb patch most interesting bit is explicit assignment:
- where copyRepo =
+ where copyRepo :: IO ()
+ copyRepo =
Even more amusing breakage was in shake, where error was in inability to infer Addr# argument. No idea was it a bug or feature.
Unsafe removals
As we’ve seen in darcs patch many unsafe${something} functions went away from Foreign modules down to their Unsafe counterparts.
Typeable
Typeable representation did change in a substantial way, thus advanced generic stuff will break. I have no example fix, but have a few of broken packages, like dependent-sum.
Hashtable gone from base
Example of fix for frag package. By the way, ghc-7.6 used to eat 8GBs of RAM compiling frag. For ghc-7.8 it was enough 700MBs even for 8 building threads.
Compiler itself
The thing I expected to try didn’t compile: unregisterised arches and GHCi on them.
I’ve hacked-up a workaround to make them build, but in threaded RTS mode it still SIGSEGVs.
STG gurus are welcome to help me :]
I have fundamental questions like:
- can unregisterised builds support SMP in theory? (via __thread attribute for example)
- did UNREG ever produce working threaded runtime?
$ cat __foo/foo.hs
main = print 1
# non-threaded works, as always been
$ inplace/bin/ghc-stage1 --make __foo/foo.hs -threaded -debug -fforce-recomp
#
$ gdb --args ./__foo/foo +RTS -D{s,i,w,g,G,b,S,t,p,a,l,m,z,c,r}
...
(gdb) run
...
7ffff7fb9700: resuming capability 0
7ffff7fb9700: cap 0: created thread 1
7ffff7fb9700: new bound thread (1)
7ffff7fb9700: cap 0: schedule()
7ffff7fb9700: cap 0: running thread 1 (ThreadRunGHC)
Jumping to 0x7ec17f
#
Program received signal SIGSEGV, Segmentation fault.
0x00000000007ec1a2 in stg_returnToStackTop ()
(gdb) bt
#0 0x00000000007ec1a2 in stg_returnToStackTop ()
#1 0x00000000007d26d9 in StgRun (f=0x7ec17f , basereg=0xca0648) at rts/StgCRun.c:81
#2 0x00000000007c7a30 in schedule (initialCapability=0xca0630, task=0xcc3b30) at rts/Schedule.c:463
#3 0x00000000007ca2c4 in scheduleWaitThread (tso=0x7ffff6b05390, ret=0x0, pcap=0x7fffffffd218) at rts/Schedule.c:2346
#4 0x00000000007c0162 in rts_evalIO (cap=0x7fffffffd218, p=0xb61450 , ret=0x0) at rts/RtsAPI.c:459
#5 0x00000000007e04c3 in ioManagerStartCap (cap=0x7fffffffd218) at rts/posix/Signals.c:184
#6 0x00000000007e04f6 in ioManagerStart () at rts/posix/Signals.c:194
#7 0x00000000007d1d5d in hs_init_ghc (argc=0xc96570 , argv=0xc96578 , rts_config=...) at rts/RtsStartup.c:262
#8 0x00000000007d000b in real_main () at rts/RtsMain.c:47
#9 0x00000000007d0122 in hs_main (argc=17, argv=0x7fffffffd418, main_closure=0xb527a0 , rts_config=...) at rts/RtsMain.c:114
#10 0x0000000000404df1 in main ()
Looks like CurrentTSO is complete garbage. Should not happen :]
Conclusion
The experience is positive. I already get bored, when see single-threaded make of ghc-7.6 and want to update a compiler.
Things like yesod, darcs, hoogle, pandoc and xmonad build fine, thus you can get working environment very fast.
Package authors are more eager to fix stuff for this release: it turns bug lookup and benchmarking into very interactive process.
I want to thank All Of You to make push haskell forward!
Thank you!
]]>Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... ghc: /usr/lib64/ghc-7.4.2/base-4.5.1.0/HSbase-4.5.1.0.o: unknown symbol `stat'
ghc: unable to load package `base'
But the bug was most popular across rare gentoo users. An interesting correlation!
This post is about the root of this problem: the gory implementation details of ghci dynamic loader down to libC and even ELF symbols!
Not scared? Fasten your belts and Read On!
GHC (this one) is both:
- a compiler (ghc binary)
- and REPL (ghci binary)
To put simplistic ghc allows you to create final binaries out of haskell sources while ghci allows runtime loading of haskell sources.
Typical session starts like that:
$ ghci
GHCi, version 7.6.3: https://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> <some code to evaluate>
ghci‘s implementation allows loading arbitrary shared library:
$ ghci -lpcre
GHCi, version 7.6.3: https://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading object (dynamic) /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.1/../../../../lib64/libpcre.so ... done
final link ... done
Prelude>
and even object file:
$ echo 'void foo(){}' > a.c &&
gcc -c a.c -o a.o &&
ghci a.o
GHCi, version 7.6.3: https://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading object (static) a.o ... done
final link ... done
Prelude>
ghci libraries are basically the same object files built of many source files.
It took me a while reproduce the bug mentioned in the very start of the post. First time I’ve heard of a bug was in December 2012 by nand but I had no idea where it comes from.
First I though it was a problem of missing headers somewhere in C code due to glibc upgrade, but no matter what combinations of binutils/gcc/glibc I tried bug did not want to show up.
6 months after after some bugs got collected I’ve noticed dreadful CFLAGS=-Os common amongst reporters which was a trigger.
Let’s explore exported symbol difference of 2 files:
- CFLAGS=-O2 /usr/lib64/ghc-7.6.3/base-4.6.0.1/HSbase-4.6.0.1.o
- CFLAGS=-Os /usr/lib64/ghc-7.6.3/base-4.6.0.1/HSbase-4.6.0.1.o
$ nm --undefined-only /usr/lib64/ghc-7.6.3/base-4.6.0.1/HSbase-4.6.0.1.o > base-O2
$ nm --undefined-only /gentoo/chroots/amd64-unstable//usr/lib64/ghc-7.6.3/base-4.6.0.1/HSbase-4.6.0.1.o > base-Os
$ diff -U0 base-O2 base-Os
- U __fxstat
+ U fstat
- U __lxstat
+ U lstat
- U memset
- U __xstat
+ U stat
Do you see it?
-Os build has stat call while -O2 has __xstat. It was the right track.
Let’s try simpler example:
cat >stat-test.c <<-EOF
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int f()
{
struct stat s;
return stat("/", &s);
}
EOF
$ gcc -c stat-test.c -Os -o stat-test-Os.o
$ gcc -c stat-test.c -O2 -o stat-test-O2.o
$ nm --undefined-only stat-test-O[2s].o
stat-test-O2.o:
U __xstat
stat-test-Os.o:
U stat
The symbols differ on different types of optimization. Let’s see if ghci treats them differently:
$ ghci stat-test-O2.o
...
Loading object (static) stat-test-Os.o ... done
final link ... done
$ ghci stat-test-Os.o
...
final link ... ghc: a.o: unknown symbol `stat'
linking extra libraries/objects failed
And it does! It means that __xstat comes from libc.so.6, but stat codes from somewhere else.
After some investivation I’ve found it’s definition:
$ nm --defined-only --extern-only /usr/lib/libc_nonshared.a
...
stat.oS:
00000000 T __i686.get_pc_thunk.bx
00000000 W stat
00000000 T __stat
...
We see here the file defining two symbols for us:
- global weak stat (the one we really need)
- global __stat (useless and potentially harmful as it might lead to symbol collision)
Now we can build-up working stat-test-Os.o by linking that weak stat symbol to our ghci:
$ ar x /usr/lib/libc_nonshared.a
$ mv stat.oS stat.o # ghci dislikes non-'*.o' extensions for object files
$ ghci a.o stat.o
Loading object (static) a.o ... done
Loading object (static) stat.o ... done
final link ... ghc: stat.o: unknown symbol `stat'
Almost works! Well, no. Nothing changed. But the reason is missing support for loading weak symbols to ghci. Whick is known as bug 3333.
I’ve pulled series of patches by akio and actualized it to ghc-7.6.3 (the result)
After pulling that patch into ghc I’ve got previous example to load on x86_64:
$ ar x /usr/lib/libc_nonshared.a
$ mv stat.oS stat.o # ghci dislikes non-'*.o' extensions for object files
$ ghci a.o stat.o
Loading object (static) a.o ... done
Loading object (static) stat.o ... done
final link ... done
Ideally, I should load all those and only those libc_nonshared.a symbols into ghci as a first library. I’ve decided to biggyback on ghc-prim module and stuff all those nonshared symbols there.
Perhaps, that bit of shell is the worst piece of code I have ever written. It weakens all needed symbols, localizes all the rest, and merges the result into ghc-prim. It also known to break on i386 as I have hidden GOT and module base required for PIC code.
ghci‘s loader interface used not only for interactive use but also for TemplateHaskell (where we have seen vector package failing to compile) thus.
This post is a great example why using native system’s loader is a good idea proposed in bug 4244.
I don’t know if it fixes all our cases but it will be way more clean than it is now.
If the workaround will show itself as too fragile I’ll have to roll back to force CFLAGS=-O2 when building ghc.
UPDATE: Now x86 works as well: libc_nonshared.a contained PIC code thus I’ve picked implementation from libc.a directly.
Our workaround even passes test from bug 7072!
]]>The most interesting projects are:
Thanks!
]]>If you happen to be involved in using/developing haskell-powered software you might like to answer our poll on that matter.
Thanks in advance!
]]>Here is a small incomplete HOWTO for gentoo users on how to build a crosscompiler running on x86_64 host targeted on ia64 platform.
It is just an example. You can pick any target.
First of all you need to enable haskell overlay and install host compiler:
# GHC_IS_UNREG=yeah emerge -av =ghc-7.6.1
The GHC_IS_UNREG=yeah bit is critical. If we won’t do it GHC build system will try to build registerised stage1 (which is a crosscompiler already).
Not setting GHC_IS_UNREG will break for a set of problems:
-
GHC will try to optimize generated bitcode with llvm‘s optimizer which will produce x86_64 instructions, not ia64.
-
GHC will try to run (broken on ia64) object splitter perl script: ghc-split.lprl.
The rest is rather simple:
# crossdev ia64-unknown-linux-gnu
# ia64-unknown-linux-gnu-emerge sys-libs/ncurses virtual/libffi dev-libs/gmp
# ln -s ${haskell_overlay}/haskell/dev-lang/ghc ${cross_overlay}/ia64-unknown-linux-gnu/ghc
# cd ${cross_overlay}/ia64-unknown-linux-gnu/ghc
# EXTRA_ECONF=--enable-unregisterised USE=ghcmakebinary ebuild ghc-9999.ebuild compile
It will fail as the following command tries to run ia64 binary on x86_64 host:
libraries/integer-gmp/cbits/mkGmpDerivedConstants > libraries/integer-gmp/cbits/GmpDerivedConstants.h
I’ve logged-in to ia64 box and ran mkGmpDerivedConstants to get a GmpDerivedConstants.h. Added the result to the ${WORKDIR} and reran last command.
After the build has finished I’ve got corsscompiler:
sf ghc-9999 # "inplace/bin/ghc-stage1" --info
[("Project name","The Glorious Glasgow Haskell Compilation System")
,("GCC extra via C opts"," -fwrapv")
,("C compiler command","/usr/bin/ia64-unknown-linux-gnu-gcc")
,("C compiler flags"," -fno-stack-protector -Wl,--hash-size=31 -Wl,--reduce-memory-overheads")
,("ld command","/usr/bin/ia64-unknown-linux-gnu-ld")
,("ld flags"," --hash-size=31 --reduce-memory-overheads")
,("ld supports compact unwind","YES")
,("ld supports build-id","YES")
,("ld is GNU ld","YES")
,("ar command","/usr/bin/ar")
,("ar flags","q")
,("ar supports at file","YES")
,("touch command","touch")
,("dllwrap command","/bin/false")
,("windres command","/bin/false")
,("perl command","/usr/bin/perl")
,("target os","OSLinux")
,("target arch","ArchUnknown")
,("target word size","8")
,("target has GNU nonexec stack","True")
,("target has .ident directive","True")
,("target has subsections via symbols","False")
,("Unregisterised","YES")
,("LLVM llc command","llc")
,("LLVM opt command","opt")
,("Project version","7.7.20130118")
,("Booter version","7.6.1")
,("Stage","1")
,("Build platform","x86_64-unknown-linux")
,("Host platform","x86_64-unknown-linux")
,("Target platform","ia64-unknown-linux")
,("Have interpreter","NO")
,("Object splitting supported","NO")
,("Have native code generator","NO")
,("Support SMP","NO")
,("Tables next to code","NO")
,("RTS ways","l debug thr thr_debug thr_l thr_p ")
,("Dynamic by default","NO")
,("Leading underscore","NO")
,("Debug on","False")
,("LibDir","/var/tmp/portage/cross-ia64-unknown-linux-gnu/ghc-9999/work/ghc-9999/inplace/lib")
,("Global Package DB","/var/tmp/portage/cross-ia64-unknown-linux-gnu/ghc-9999/work/ghc-9999/inplace/lib/package.conf.d")
]
# cat a.hs
main = print 1
# "inplace/bin/ghc-stage1" a.hs -fforce-recomp -o a
[1 of 1] Compiling Main ( a.hs, a.o )
Linking a ...
# file a
a: ELF 64-bit LSB executable, IA-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.16, not stripped
# LANG=C ls -lh a
-rwxr-xr-x 1 root portage 24M Jan 20 02:24 a
on ia64:
$ ./a
1
Results:
-
It’s not that hard to build a ghc with some exotic target if you have gcc there.
-
mkGmpDerivedConstants needs to be more cross-compiler friendly It should be really simple to implement, it only queries for data sizes/offsets. I think autotools is already able to do it.
-
GHC should be able to run llvm with correct -mtriple in crosscompiler case. That way we would get registerised crosscompiler.
Some TODOs:
In order to coexist with native compiler ghc should stop mangling —-target=ia64-unknown-linux-gnu option passed by user and name resulting compiler a ia64-unknown-linux-gnu-ghc and not ia64-unknown-linux-ghc.
That way I could have many flavours of compiler for one target. For example I would like to have x86_64-pc-linux-gnu-ghc as a registerised compiler and x86_64-unknown-linux-gnu-ghc as an unreg one.
And yes, they will all be tracked by gentoo’s package manager.
]]>Some notes and events in the overlay:
- ghc-7.6.1 is available for all major arches we try to support
- a few ebuilds of overlay were converted to EAPI=5 to use subslot depends (see below)
- we’ve got working ghc-9999 ebuild with shared libraries by default! (see below)
ghc-7.6
That beast brought two major problems to it’s users:
- Prelude.catch gone away and is called ‘System.IO.Error.catchIOError’ now
- directory package broke interface to existing function ‘getModificationTime’ without old compatible variant.
While the first breakage is easy to fix by something like:
#if MIN_VERSION_base(4,6,0)
catch :: IO a -> (IOError -> IO a) -> IO a
catch = System.IO.Error.catchIOError
(or just switch to extensible-exceptions package if you need support for really old ghc versions).
The second one is literally a disaster
-getModificationTime :: FilePath -> IO ClockTime
+getModificationTime :: FilePath -> IO UTCTime
It is not as straightforward and "fixes" in various packages break PVP in a very funny way.
Look at this example.
Now that package has random signature type depending on which directory version it decided to build against.
TODO: find a nice and simple ‘:: ClockTime -> IO UTCTime’ compatibility function to end that keep creeping mess. (I wish the directory package to provide that).
Okay. Enough ranting.
EAPI=5
Some of experienced gentoo haskell users already know about the magic haskell-updater tool written by Ivan to fix the mess after ghc upgrade or some base library upgrade.
Typical symptom of broken libraries is the similar ghc-pkg check result:
There are problems in package data-accessor-monads-fd-0.2.0.3:
dependency "monads-fd-0.1.0.4-830f79a91000e99707aac145b972f786" doesn't exist
There are problems in package LibZip-0.10.2:
dependency "mtl-2.0.1.0-b1b6de8085e5ea10cc0eb01054b69110" doesn't exist
There are problems in package jail-0.0.1.1:
dependency "monads-fd-0.1.0.4-830f79a91000e99707aac145b972f786" doesn't exist
Why it happens?
Well, ghc’s library ABI depends on ABIs on all the libraries it uses. It has quite nasty consequences.
Once you upgrade a library you need to:
- rebuld all the reverse dependencies
- and their reverse dependencies (recursive)
The first point can be solved by EAPI 5 so called SUBSLOT feature.
The second one is not solved yet, but i was said is planned for EAPI=6. Thus you will still need to use haskell-updater time to time.
Anyway, I’ve bumped binary package today and to show how portage picks all it’s immediate users:
# emerge -av1 dev-haskell/binary
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild r U ~] dev-haskell/binary-0.6.4.0:0/0.6.4.0::gentoo-haskell [0.6.2.0:0/0.6.2.0::gentoo-haskell] USE="doc hscolour {test} -hoogle -profile" 0 kB
[ebuild r U ~] dev-haskell/sha-1.6.1:0/1.6.1::gentoo-haskell [1.6.0:0/1.6.0::gentoo-haskell] USE="doc hscolour -hoogle -profile" 2,651 kB
[ebuild r U ~] dev-haskell/zip-archive-0.1.2.1-r2:0/0.1.2.1::gentoo-haskell [0.1.2.1-r1:0/0.1.2.1::gentoo-haskell] USE="doc hscolour {test} -hoogle -profile" 0 kB
[ebuild rR ~] dev-haskell/data-binary-ieee754-0.4.3:0/0.4.3::gentoo-haskell USE="doc hscolour -hoogle -profile" 0 kB
[ebuild rR ~] dev-haskell/dyre-0.8.11:0/0.8.11::gentoo-haskell USE="doc hscolour -hoogle -profile" 0 kB
[ebuild rR ~] dev-haskell/hxt-9.3.1.1:0/9.3.1.1::gentoo-haskell USE="doc hscolour -hoogle -profile" 0 kB
[ebuild rR ~] dev-haskell/hashed-storage-0.5.10:0/0.5.10::gentoo-haskell USE="doc hscolour {test} -hoogle -profile" 0 kB
[ebuild rR ~] dev-haskell/dbus-core-0.9.3-r1:0/0.9.3::gentoo-haskell USE="doc hscolour -hoogle -profile" 0 kB
[ebuild rR ~] dev-haskell/hoogle-4.2.14:0/4.2.14::gentoo-haskell USE="doc fetchdb hscolour -fetchdb-ghc -hoogle -localdb -profile" 0 kB
[ebuild rR ~] www-apps/gitit-0.10.0.2-r1:0/0.10.0.2::gentoo-haskell USE="doc hscolour plugins -hoogle -profile" 0 kB
[ebuild r U ~] dev-haskell/yesod-auth-1.1.1.7:0/1.1.1.7::gentoo-haskell [1.1.1.6:0/1.1.1.6::gentoo-haskell] USE="doc hscolour -hoogle -profile" 17 kB
[ebuild rR ~] dev-haskell/yesod-1.1.4:0/1.1.4::gentoo-haskell USE="doc hscolour -hoogle -profile" 0 kB
Total: 12 packages (4 upgrades, 8 reinstalls), Size of downloads: 2,668 kB
Would you like to merge these packages? [Yes/No]
I would like to rebuild all the sha (and so on) revdeps as well, but EAPI can’t express that kind of depends yet.
The EAPI=5 ebuild slowly drift to main portage tree as well.
ghc-9999
The most iteresting thing!
With great Mark’s help we now have live ghc ebuild right out of gti tree!
One of the most notable things is the dynamic linking by default.
# ldd `which happy` # ghc-7.7.20121116
linux-vdso.so.1 (0x00007fffb0bff000)
libHScontainers-0.5.0.0-ghc7.7.20121116.so => /usr/lib64/ghc-7.7.20121116/containers-0.5.0.0/libHScontainers-0.5.0.0-ghc7.7.20121116.so (0x00007fe616972000)
libHSarray-0.4.0.1-ghc7.7.20121116.so => /usr/lib64/ghc-7.7.20121116/array-0.4.0.1/libHSarray-0.4.0.1-ghc7.7.20121116.so (0x00007fe6166d0000)
libHSbase-4.6.0.0-ghc7.7.20121116.so => /usr/lib64/ghc-7.7.20121116/base-4.6.0.0/libHSbase-4.6.0.0-ghc7.7.20121116.so (0x00007fe615df9000)
libHSinteger-gmp-0.5.0.0-ghc7.7.20121116.so => /usr/lib64/ghc-7.7.20121116/integer-gmp-0.5.0.0/libHSinteger-gmp-0.5.0.0-ghc7.7.20121116.so (0x00007fe615be6000)
libHSghc-prim-0.3.0.0-ghc7.7.20121116.so => /usr/lib64/ghc-7.7.20121116/ghc-prim-0.3.0.0/libHSghc-prim-0.3.0.0-ghc7.7.20121116.so (0x00007fe615976000)
libHSrts-ghc7.7.20121116.so => /usr/lib64/ghc-7.7.20121116/rts-1.0/libHSrts-ghc7.7.20121116.so (0x00007fe615715000)
libc.so.6 => /lib64/libc.so.6 (0x00007fe61536c000)
libHSdeepseq-1.3.0.1-ghc7.7.20121116.so => /usr/lib64/ghc-7.7.20121116/containers-0.5.0.0/../deepseq-1.3.0.1/libHSdeepseq-1.3.0.1-ghc7.7.20121116.so (0x00007fe615162000)
libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x00007fe614ef4000)
libffi.so.6 => /usr/lib64/libffi.so.6 (0x00007fe614cec000)
libm.so.6 => /lib64/libm.so.6 (0x00007fe6149f2000)
librt.so.1 => /lib64/librt.so.1 (0x00007fe6147ea000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fe6145e6000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe616d41000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe6143ca000)
$ ls -lh `which pandoc` # ghc-7.7.20121116
-rwxr-xr-x 1 root root 6.3M Nov 16 16:38 /usr/bin/pandoc
$ ls -lh `which pandoc` # ghc-7.4.2
-rwxr-xr-x 1 root root 27M Nov 18 17:46 /usr/bin/pandoc
Actually, the whole ghc-9999 installation is 150MB smaller, than ghc-7.4.1 on amd64.
Quite a win!
And as a side effect revdep-rebuild (or portage’s FEATURES=preserved-rebuild) can note (and fix) introduced breakages due to upgrades!
Work on the ghc cross-compilation in the ebuild slowly continues (needs some upstream fixes to support toolchains inferred from build/host/target triplets).
Have fun!
]]>