Glocks: ecient support for highly-contended locks in many-core CMPs, Proceedings of the 2011 IEEE International Parallel and Distributed Processing Symposium, IPDPS '11, pp.893-905, 2011. ,
Adaptive backo synchronization techniques, Proceedings of the 16th Annual International Symposium on Computer Architecture, ISCA '89, pp.396-406, 1989. ,
HPC runtime support for fast and power ecient locking and synchronization, Proceedings of the 2013 IEEE International Conference on Cluster Computing, CLUSTER '13, pp.1-7, 2013. ,
The performance of spin lock alternatives for shared-money multiprocessors, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.1, pp.6-16, 1990. ,
DOI : 10.1109/71.80120
Concurrent Programming: principles and Practice, 1991. ,
Enhancement to the MCS lock for increased functionality and improved programmability. U.S. patent application 10, p.745, 2003. ,
Thin locks: featherweight synchronization for java, Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI '98, pp.258-268, 1998. ,
The multikernel, Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pp.29-44, 2009. ,
DOI : 10.1145/1629575.1629579
Lightweight remote procedure call, ACM Transactions on Computer Systems, vol.8, issue.1, pp.37-55, 1990. ,
DOI : 10.1145/77648.77650
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.2695
URPC: a toolkit for prototyping remote procedure calls, The Computer Journal, vol.39, issue.6, pp.525-540, 1996. ,
Optimal Strategies for Spinning and Blocking, Journal of Parallel and Distributed Computing, vol.21, issue.2, pp.246-254, 1994. ,
DOI : 10.1006/jpdc.1994.1056
Corey: an operating system for many cores, Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI '08, pp.43-57, 2008. ,
An analysis of linux scalability to many cores, Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI '10, 2010. ,
Non-scalable locks are dangerous, Proceedings of the 13th Ottawa Linux Symposium, OLS '13, 2012. ,
Improved analysis and evaluation of real-time semaphore protocols for P-FP scheduling, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp.141-152, 2013. ,
DOI : 10.1109/RTAS.2013.6531087
Fully-adaptive algorithms for long-lived renaming, Proceedings of the 20th International Conference on Distributed Computing, DISC '06, pp.413-427, 2006. ,
DOI : 10.1007/s00446-011-0137-5
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.6011
A Portable Programming Interface for Performance Evaluation on Modern Processors, International Journal of High Performance Computing Applications, vol.14, issue.3, pp.189-204, 2000. ,
DOI : 10.1177/109434200001400303
Locking policies for multiprocessor ada, ACM SIGAda Ada Letters, vol.33, issue.2, pp.59-65, 2013. ,
DOI : 10.1145/2552999.2553006
A Schedulability Compatible Multiprocessor Resource Sharing Protocol -- MrsP, 2013 25th Euromicro Conference on Real-Time Systems, pp.282-291, 2013. ,
DOI : 10.1109/ECRTS.2013.37
Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores, Proceedings of the 17th International Conference on Principles of Distributed Systems, OPODIS '13, pp.83-97, 2013. ,
DOI : 10.1007/978-3-319-03850-6_7
Sharing and protection in a single-address-space operating system, ACM Transactions on Computer Systems, vol.12, issue.4, pp.271-307, 1994. ,
DOI : 10.1145/195792.195795
Fast asymmetric thread synchronization, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-2722, 2013. ,
DOI : 10.1145/2400682.2400686
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor, IEEE Micro, vol.30, issue.2, pp.16-29, 2010. ,
DOI : 10.1109/MM.2010.31
Building FIFO and priority-queueing spin locks from atomic swap, 2003. ,
Memcached: distributed memory object caching system ,
Trac management: a holistic approach to memory placement on NUMA systems, Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pp.381-394, 2013. ,
Continuously measuring critical section pressure with the free lunch profiler, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00957154
Everything you always wanted to know about synchronization but were afraid to ask, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pp.33-48, 2013. ,
DOI : 10.1145/2517349.2522714
MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008. ,
DOI : 10.1145/1327452.1327492
Polite busy-waiting with wrpause on sparc, 2012. ,
Flat-combining NUMA locks, Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, SPAA '11, pp.65-74, 2011. ,
DOI : 10.1145/1989493.1989502
Lock cohorting: a general technique for designing NUMA locks, Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pp.247-256, 2012. ,
Cooperating Sequential Processes, 1965. ,
DOI : 10.1007/978-1-4757-3472-0_2
A Scalability-Aware Kernel Executive for Many-Core Operating Systems, Proceedings of the 1st Workshop on Runtime and Operating Systems for the Many-core Era, WROSME '13, pp.1-10, 2013. ,
DOI : 10.1007/978-3-642-54420-0_80
Smartlocks, Proceeding of the 7th international conference on Autonomic computing, ICAC '10, pp.215-224, 2010. ,
DOI : 10.1145/1809049.1809079
Sim: a highly-ecient wait-free universal construction ,
Revisiting the combining synchronization technique, Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pp.257-266 ,
On the inherent weakness of conditional synchronization primitives, Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing , PODC '04, pp.80-87, 2004. ,
DOI : 10.1145/1011767.1011780
Distributed caching with memcached, Linux Journal, issue.124, p.5, 2004. ,
Evolving mach 3.0 to a migrating thread model, Proceedings of the USENIX Winter 1994 Technical Conference, WTEC'94, pp.9-9, 1994. ,
Refactoring: Improving the Design of Existing Code, 1999. ,
DOI : 10.1007/3-540-45672-4_31
A study of the scalability of stop-theworld garbage collectors on multicores, Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pp.229-240, 2013. ,
Language support for lightweight transactions, ACM SIGPLAN Notices, vol.38, issue.11, pp.388-402, 2003. ,
DOI : 10.1145/949343.949340
Towards whatever-scale abstractions for data-driven parallelism, Proceedings of the 1st International Workshop on Rack Scale Computing, p.14, 2014. ,
Remote Invalidation: Optimizing the Critical Path of Memory Transactions, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014. ,
DOI : 10.1109/IPDPS.2014.30
Time-published queue-based spin locks ,
DOI : 10.1007/11602569_6
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.8532
Preemption Adaptivity in Time-Published Queue-Based Spin Locks, Proceedings of the 11th International Conference on High Performance Computing, HiPC'05, pp.7-18, 2005. ,
DOI : 10.1007/11602569_6
Flat combining and the synchronizationparallelism tradeo ,
Flat combining and the synchronizationparallelism tradeo, Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '10, pp.355-364, 2010. ,
Obstruction-free synchronization: double-ended queues as an example, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings., pp.522-529, 2003. ,
DOI : 10.1109/ICDCS.2003.1203503
Software transactional memory for dynamic-sized data structures, Proceedings of the twenty-second annual symposium on Principles of distributed computing , PODC '03, pp.92-101, 2003. ,
DOI : 10.1145/872035.872048
Transactional memory: architectural support for lock-free data structures, Proceedings of the 20th Annual International Symposium on Computer Architecture, ISCA '93, pp.289-300, 1993. ,
The art of multiprocessor programming, Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing , PODC '06, 2008. ,
DOI : 10.1145/1146381.1146382
Impossibility and universality results for wait-free synchronization, Proceedings of the seventh annual ACM Symposium on Principles of distributed computing , PODC '88, pp.276-290, 1988. ,
DOI : 10.1145/62546.62593
URL : http://repository.cmu.edu/cgi/viewcontent.cgi?article=2796&context=compsci
Monitors: an operating system structuring concept, Communications of the ACM, vol.17, issue.10, pp.549-557, 1974. ,
DOI : 10.1145/355620.361161
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.6394
Decoupling contention management from scheduling, Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, pp.117-128, 2010. ,
DOI : 10.1145/1735970.1736035
URL : http://infoscience.epfl.ch/record/142307
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach, Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pp.357-368, 2013. ,
An architecture for mostly functional languages, Proceedings of the 1986 ACM conference on LISP and functional programming , LFP '86, pp.105-112, 1986. ,
DOI : 10.1145/319838.319854
Wait-free queues with multiple enqueuers and dequeuers, Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, pp.223-234, 2011. ,
DOI : 10.1145/2038037.1941585
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.222.6484
A methodology for creating fast wait-free data structures, Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pp.141-150 ,
Bias scheduling in heterogeneous multi-core architectures, Proceedings of the 5th European conference on Computer systems, EuroSys '10, pp.125-138, 2010. ,
DOI : 10.1145/1755913.1755928
Memprof: a memory profiler for NUMA multicore systems, Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC '12, pp.5-5 ,
URL : https://hal.archives-ouvertes.fr/hal-00945731
System Specifications for the DYSEAC, Journal of the ACM, vol.1, issue.2, pp.57-81, 1954. ,
DOI : 10.1145/320772.320773
A modeling study of the TPC-C benchmark, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD '93, pp.22-31, 1993. ,
Sheri: precise detection and automatic mitigation of false sharing, Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, pp.3-18, 2011. ,
Le Remote Core Lock (RCL) : une nouvelle technique de verrouillage pour les architectures multi-coeur, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01302676
PHP bug report #62064, 2012. ,
Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications, Proceedings of the 2012 USENIX Annual Technical Conference, USENIX ATC '12, pp.65-76, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00991709
Ecient locking for multicore architectures, 2011. ,
A Hierarchical CLH Queue Lock, Proceedings of the 12th International Conference on Parallel Processing, Euro-Par'06, pp.801-810, 2006. ,
DOI : 10.1007/11823285_84
Queue locks on cache coherent multiprocessors, Proceedings of 8th International Parallel Processing Symposium, pp.165-171, 1994. ,
DOI : 10.1109/IPPS.1994.288305
Algorithms for scalable synchronization on shared-memory multiprocessors, ACM Transactions on Computer Systems, vol.9, issue.1, pp.21-65, 1991. ,
DOI : 10.1145/103727.103729
Synchronization without contention, Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV, pp.269-278, 1991. ,
DOI : 10.1145/106972.106999
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.17.2001
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms, Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing , PODC '96, pp.267-275, 1996. ,
DOI : 10.1145/248052.248106
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574
Helios, Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pp.221-234, 2009. ,
DOI : 10.1145/1629575.1629597
Scheduling techniques for concurrent systems, Proceedings of the 3rd International Conference on Distributed Computing Systems, ICDCS'82, pp.22-30, 1982. ,
Executing parallel programs with synchronization bottlenecks eciently, Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, PDSIA'99 ,
Documenting and automating collateral evolutions in linux device drivers, Proceedings of the 3rd European Conference on Computer Systems 2008, Eurosys '08, pp.247-260, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00123142
Evaluation of message passing synchronization algorithms in embedded systems, 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), p.14, 2014. ,
DOI : 10.1109/SAMOS.2014.6893222
A low-overhead coherence solution for multiprocessors with private cache memories, Proceedings of the 11th Annual International Symposium on Computer Architecture, ISCA '84, pp.348-354, 1984. ,
Computer Organization and Design: the Hardware/Software Interface, 2007. ,
Locating cache performance bottlenecks using data profiling, Proceedings of the 5th European conference on Computer systems, EuroSys '10, pp.335-348, 2010. ,
DOI : 10.1145/1755913.1755947
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.163.9819
Leveraging hardware message passing for ecient thread synchronization, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pp.143-154, 2014. ,
Lock contention aware thread migrations, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pp.369-370, 2014. ,
DOI : 10.1145/2692916.2555273
Hierarchical backo locks for nonuniform communication architectures, Proceedings of the 9th International Symposium on High-Performance Computer Architecture, HPCA '03, pp.241-253, 2003. ,
Design and development of MINIX distributed operating system, Proceedings of the 1988 ACM sixteenth annual conference on Computer science , CSC '88, pp.685-685, 1988. ,
DOI : 10.1145/322609.323152
Evaluating MapReduce for Multi-core and Multiprocessor Systems, 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp.13-24, 2007. ,
DOI : 10.1109/HPCA.2007.346181
Instruction-level parallel processing: History, overview, and perspective, The Journal of Supercomputing, vol.34, issue.1, pp.9-50, 1993. ,
DOI : 10.1007/BF01205181
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.2892
EHCtor: detecting resource-release omission faults in error-handling code for systems software, CFSE '9, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01302679
HECTOR, Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, BCB '15, pp.1-12, 2013. ,
DOI : 10.1145/2808719.2808725
URL : https://hal.archives-ouvertes.fr/hal-00918079
Scalable queue-based spin locks with timeout, Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, PPoPP '01, pp.44-52, 2001. ,
DOI : 10.1145/379539.379566
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.551.8520
Hugh: A Semantically Aware Universal Construction for Transactional Memory Systems, Proceedings of the 19th International Conference on Parallel Processing, Euro-Par '13, pp.470-481, 2013. ,
DOI : 10.1007/978-3-642-40047-6_48
Software transactional memory, Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing, PODC '95, pp.204-213, 1995. ,
SPLASH, ACM SIGARCH Computer Architecture News, vol.20, issue.1, pp.5-44, 1992. ,
DOI : 10.1145/130823.130824
Thread migration to improve synchronization performance, Proceedings of the 2nd Workshop on Operating System Interference in High Performance Applications, OSIHPA '06, 2006. ,
The Phoenix system for MapReduce programming ,
Accelerating critical section execution with asymmetric multi-core architectures, Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pp.253-264, 2009. ,
Phoenix++, Proceedings of the second international workshop on MapReduce and its applications, MapReduce '11, pp.9-16, 2011. ,
DOI : 10.1145/1996092.1996095
Distributed operating systems anno 1992. what have we learned so far? Distributed Systems Engineering, pp.3-10, 1993. ,
DOI : 10.1088/0967-1846/1/1/001
PHP: hypertext preprocessor ,
Simultaneous multithreading: maximizing onchip parallelism, Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, pp.392-403, 1995. ,
DOI : 10.1109/isca.1995.524578
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.1383
Combiner/aggregator synchronization primitive. https://software.intelÛ .com/en-us/blogscombineraggregator-synchronization-primitive, 2013. ,
Selective core boosting: the return of the turbo button, 2013. ,
The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, pp.24-36, 1995. ,
Ad hoc synchronization considered harmful, Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI '10, pp.1-8, 2010. ,
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.198-207, 2009. ,
DOI : 10.1109/IISWC.2009.5306783
Automatic measurement of memory hierarchy parameters, Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '05, pp.181-192, 2005. ,
Memory management for many-core processors with software configurable locality policies, Proceedings of the 2012 International Symposium on Memory Management, ISMM '12, pp.3-14 ,
19 (a) Cost of local and remote accesses ,
45 (a) Traditional locks, Critical sections with traditional locks vs, p.45 ,
64 (a) One shared cache line per CS 64 (b) Five shared cache lines per CS, p.64 ,
66 (a) One shared cache line per CS 66 (b) Five shared cache lines per CS, p.66 ,
73 (a) Magnycours-48: SPLASH-2 and Phoenix 2 73 (b) Magnycours-48 73 (c) Benchmark parameters, Berkeley DB, vol.73, issue.73, pp.2-128 ,
81 (a) Use rate with one lock per hardware thread 81 (b) RCL server configurations, p.81 ,
83 (a) Magnycours-48: Order Status 83 (b) Niagara2-128: Order Status 83 (c) Magnycours-48: Stock Level, pp.83-85 ,
85 (a) Magnycours-48: Order Status 85 (b) Niagara2-128: Order Status 85 (c) Magnycours-48: Stock Level, pp.85-87 ,
102 (a) Magnycours-48: une ligne de cache par 102 (b) Magnycours-48: cinq lignes de cache par 102 (c) Niagara2-128: une ligne de cache par, pp.2-128 ,
104 (a) Temps passé en section critique sur Magnycours-48 104 (b) Temps passé en section critique sur Niagara2-128, p.104 ,
105 (a) Magnycours-48: SPLASH-2 et Phoenix 2 105 (b) Magnycours-48 105 (c) Paramètres des benchmarks 105 (d) Niagara2-128: SPLASH-2 105 (f) Magnycours-48, Berkeley DB, vol.105, issue.105, pp.2-128 ,