ac手表官网E丅AC〇1.211是什么

杭电ACM题目_百度文库
两大类热门资源免费畅读
续费一年阅读会员,立省24元!
杭电ACM题目
上传于||文档简介
&&杭电ACM题目
阅读已结束,如果下载本文需要使用2下载券
想免费下载本文?
定制HR最喜欢的简历
下载文档到电脑,查找使用更方便
还剩52页未读,继续阅读
定制HR最喜欢的简历
你可能喜欢Planet Clang
Planet Clang
December 14, 2016
September 12, 2016
Chandler CarruthHal FinkelArnaud de GrandmaisonDavid KippingAnton KorobeynikovTanya LattnerChris LattnerJohn RegehrThree new members and five continuing members were elected to the eight person board. The new board consists of individuals from corporations and from the academic and scientific communities. They also represent various geographical groups of the LLVM community. All board members are dedicated and passionate about the programs of the LLVM Foundation and growing and supporting the LLVM community. When voting on new board members, we took into consideration all contributions (past and present) and current involvement in the LLVM community. We also tried to create a balanced board of individuals from a wide range of backgrounds and locations to provide a voice to as many groups within the LLVM community. We want to thank everyone who applied as we had many strong applications. As the programs of the LLVM Foundation grow we will be relying on volunteers to help us reach success. Please join our
to be informed of volunteer opportunities.About the board of directors (listed alphabetically by last name):Chandler Carruth has been an active contributor to LLVM since 2007. Over the years, he has has worked on LLVM’s memory model and atomics, Clang’s C++ support, GCC-compatible driver, initial profile-aware code layout optimization pass, pass manager, IPO infrastructure, and much more. He is the current code owner of inlining and SSA formation.In addition to his numerous technical contributions, Chandler has led Google’s LLVM efforts since 2010 and shepherded a number of new efforts that have positively and significantly impacted the LLVM project. These new efforts include things such as adding C++ modules to Clang, adding address and other sanitizers to Clang/LLVM, making Clang compatible with MSVC and available to the Windows C++ developer community, and much more.Chandler works at Google Inc. as a technical lead for their C++ developer platform and has served on the LLVM Foundation board of directors for the last 2 years. Hal Finkel has been an active contributor to the LLVM project since 2011. He is the code owner for the PowerPC target, alias-analysis infrastructure, loop re-roller and the basic-block vectorizer.
In addition to his numerous technical contributions, Hal has chaired the LLVM in HPC workshop, which is held in conjunction with Super Computing (SC), for the last 3 years. This workshop provides a venue for the presentation of peer-reviewed HPC-related researching LLVM from both industry and academia. He has also been involved in organizing an LLVM-themed BoF session at SC and LLVM socials in Austin.Hal is Lead for Compiler Technology and Programming Languages at Argonne National Laboratory’s Leadership Computing Facility. Arnaud de Grandmaison has been hacking on LLVM projects since 2008. In addition to his open source contributions, he has worked for many years on private out-of-tree LLVM-based projects at Parrot, DiBcom, or ARM. He has also been a leader in the European LLVM community by organizing the EuroLLVM Developers’ meeting, Paris socials, and chaired or participated in numerous program committees for the LLVM Developers’ Meetings and other LLVM related conferences.Arnaud has attended numerous LLVM Developers’ meetings and volunteered as moderator or presented as well. He also moderates several LLVM mailing lists.
Arnaud is also very involved in community wide discussions and decisions such as re-licensing and code of conduct.Arnaud is a Principal Engineer at ARM.David Kipping has been involved with the LLVM project since 2010. He has been a key organizer and supporter of many LLVM community events such as the US and European LLVM Developers’ Meetings. He has served on many of the program committees for these events. David has worked hard to advance the adoption of LLVM at Qualcomm and other companies. One such example of his efforts is the LLVM track he created at the 2011 Linux Collaboration summit. He has over 30 years experience in open source and developer tools including working on C++ at Borland. David has served on the board of directors for the last 2 years and has held the officer position of treasurer. The treasurer is a time demanding position in that he supports the day to day operation of the foundation, balancing the books, and generates monthly treasurer reports. David is Director of Product Management at Qualcomm and has served on the LLVM Foundation board of directors for the last 2 yearsAnton Korobeynikov has been an active contributor to the LLVM project since 2006. Over the years, he has numerous technical contributions to areas including Windows support, ELF features, debug info, exception handling, and backends such as ARM and x86. He was the original author of the MSP430 and original System Z backend. In addition to his technical contributions, Anton has maintained LLVM’s participation in Google Summer of Code by managing applications, deadlines, and overall organization. He also supports the LLVM infrastructure and has been on numerous program committees for the LLVM Developers’ Meetings (both US and EuroLLVM). Anton is currently an associate professor at the Saint Petersburg State University and has served on the LLVM Foundation board of directors for the last 2 years. Tanya Lattner has been involved in the LLVM project for over 14 years. She began as a graduate student who wrote her master's thesis using LLVM, and continued on using and extending LLVM technologies at various jobs during her career as a compiler engineer.
Tanya has been organizing the US LLVM Developers’ meeting since 2008 and attended every developer meeting. She was the LLVM release manager for 3 years, moderates the LLVM mailing lists, and helps administer the LLVM infrastructure servers, mailing lists, bugzilla, etc. Tanya has also been on the program committee for the US LLVM Developers’ meeting (4 years) and the EuroLLVM Developers’ Meeting (1 year). With the support of the initial board of directors, Tanya created the LLVM Foundation, defined its charitable and education mission, and worked to get 501(c)(3) status. Tanya is the Chief Operating Officer and has served as the President of the LLVM Foundation board for the last 2 years.Chris Lattner is well known as the founder for the LLVM project and has a lengthy history of technical contributions to the project over the years.
He drove much of the early implementation, architecture, and design of LLVM and Clang.Chris has attended every LLVM Developers’ meeting, and presented at the majority. He helped drive the conception and incorporation of the LLVM Foundation, and has served as Secretary of the board for the last 2 years. Chris also grants commit access to the LLVM Project, moderates mailing lists, moderates and edits the LLVM blog, and drives important non-technical discussions and policy decisions related to the LLVM project. Chris manages the Developer Tools department at Apple Inc and has served on the LLVM Foundation board of directors for the last 2 years.John Regehr has been involved in LLVM for a number of years. As a professor of computer science at the University of Utah, his research specializes in compiler correctness and undefined behavior. He is well known within the LLVM community for the hundreds of bug reports his group has reported to LLVM/Clang.John was a project lead for IOC, a Clang based integer overflow checker that eventually became the basis for the integer parts of UBSan. He was also the primary developer of C-Reduce which utilizes Clang as a library and is often used as a test case reducer for compiler issues.In addition to his technical contributions, John has served on several LLVM-related program committees. He also has a widely read blog about LLVM and other compiler-related issues ().
August 17, 2016
We are excited to announce the next release of the Intel(R) OpenMP* Runtime Library at openmprtl.org. This release aligns with Intel(R) Parallel Studio XE 2017 Composer Edition
New Features:
OpenMP* 4.5 nonmonotonic modifier for schedule dynamic and guided support
Bug Fixes:
July 22, 2016
Keeping and sharing of the confidential documents on the Internet are becoming an indispensable necessity of the modern business sphere. A wide range of providers offer businessmen highly protected digital venues that have to simplify the entire course of the transaction fulfillment – so-called VDRs. Their main function is to ensure safety to virtual versions of documents. A virtual repository might be utilized not merely as a storage for secret data but also as space where numerous partners have a possibility to exchange and discuss files, transactions, and deals. In such a way, virtual platforms reduced the need to finish the entire project merely during personal negotiations. In the event that you wish to study more about data room services, please, pay your attention to this site – . Considering the fact that VDRs are becoming more and more popular, a lot of vendors provide their services on the market. Undoubtedly, not all the existing platforms are reliable and decent enough to rely on. To choose a reliable platform, the one is expected to pay attention to various characteristics.
1. Reputation and experience of the VDR
Be attentive to the reputation of the vendor. Usually, the reputation might be assessed on the basis of the comments available in media. Both – the comments of regular room visitors and the investigation by experts – are handy when it comes to choosing a virtual data room vendor. In addition, it seems to be interesting to look through the details of the transactions which were completed with the help of the software developer and, possibly, even to meet with the deal-makers from a corporation which has already exploited services of the provider. In addition, the decent vendor will be capable of offering customized services to a VDR visitor.
2. The functions the virtual room will provide you with
In a case of searching out a virtual room, a certain set of crucially important options must be considered. Thus, a decent room concentrates on document and access protection. The data room is excepted to possess all the specific certificates (SSAE 16 and ISO 27001), provide encryption of the data, firewalls, multi-sided user verification system, watermarks, etc. In addition, the room administrator is expected to possess all the rights to manage access to the virtual room in general and to chosen documents, folders. Regular audit reports allow to keep an eye on all the actions in the virtual repository. Together with being secure, the virtual room is expected to be convenient to utilize. That is why, simple interface that simplifies navigation in the VDR proved to be crucial. Various upload and search instruments also simplify and speed up the work in the virtual room. The mentioned options do not constitute an exhaustive list of the instruments which the user must look for in the virtual data room: the demands will depend on the customer’s needs.
3. Expected expenditures
If there are various provides on the market, the potential clients may choose among repositories of different price: the cost of utilization varies significantly on the vendor, on the expected time needed for execution of the deal, on the particular options demanded, etc. Hence, the deal-maker is supposed to be down-to-earth and to determine how much he is willing to pay for the virtual room.
4. The benefits you and your stakeholders will have a chance to experience
A platform have to be not only cheap enough and comfortable for the owner but also meet the expectations of the owner’s present or potential partners. From time to time it is advisable to choose the more luxurious virtual room considering it has numerous features required by the other side of the deal.
5. Required set of functions
Before make a first payment for a virtual data room, critical evaluation of the demands and expectations is supposed to be performed: not a single deal-maker wants to pay for an expensive software provided with a bunch of useless tools. The client has to make sure he is paying for what he really needs and expects – not for trendy and catchy instruments which have totally nothing in common with information security and storage.
If following these simple guidelines in mind, the potential VDR user will have just a few doubts when selecting a VDR. Although the selection process may require a considerable amount of time, it is better to waste a bit more time and to try demo versions of diverse virtual repositories than to choose the very first virtual platform which was on sale at affordable price. You are supposed to remember the fact that you are about to pay for your protection and comfort and the virtual data room should not be the ones to save money on.
June 27, 2016
June 21, 2016
This blog post covers the background, design, current status and usage information.This post was written by Teresa Johnson, Mehdi Amini and David Li.LTO Background and MotivationLTO (Link Time Optimization) is a method for achieving better runtime performance through whole-program analysis and cross-module optimization. During the compile phase, clang will emit LLVM bitcode
instead of an object file. The linker recognizes these bitcode files and invokes LLVM during the link to generate the final objects that will constitute the executable. The LLVM implementation loads all input bitcode files and merges them together to produce a single Module. The interprocedural analyses (IPA) as well as the interprocedural optimizations (IPO) are performed serially on this monolithic Module.What this means in practice is that LTO often requires a large amount of memory (to hold all IR at once) and is very slow. And with debug information enabled via -g, the size of the IR and the resulting memory requirements are significantly larger. Even without debug information, this is prohibitive for very large applications, or when compiling on memory-constrained machines. It also makes incremental builds less effective, as everything from the LTO step on must be re-executed when any input source changes.ThinLTO DesignThinLTO is a new approach that is designed to scale like a non-LTO build, while retaining most of the performance achievement of full LTO.In ThinLTO, the serial step is very thin and fast. This is because instead of loading the bitcode and merging a single monolithic module to perform these analyses, it utilizes compact summaries of each module for global analyses in the serial link step, as well as an index of function locations for later cross module importing. The function importing and other IPO transformations are performed later when the modules are optimized in fully parallel backends.The key transformation enabled by ThinLTO global analyses is function importing, in which only those functions likely to be inlined are imported into each module. This minimizes the memory overhead in each ThinLTO backend, while maximizing the most impactful cross module optimization opportunities. The IPO transformations are therefore performed on each module extended with its imported functions.The ThinLTO process is divided into 3 phases:Compile: Generate IR as with full LTO mode, but extended with module summaries Thin Link: Thin linker plugin layer to combine summaries and perform global analyses ThinLTO backend: Parallel backends with summary-based importing and optimizations By default, linkers that support ThinLTO (see below) are set up to launch the ThinLTO backends in threads. So the distinction between the second and third phases is transparent to the user.The key enabler for this process are the summaries emitted during phase 1. These summaries are emitted using the bitcode format, but designed so that they can be separately loaded without involving an LLVMContext or any other expensive construction. Each global variable and function has an entry in the module summary. An entry contains metadata that abstracts the symbol it is describing. For example, a function is abstracted with its linkage type, the number of instructions it contains, and optional profiling information (PGO). Additionally, every reference (address taken, direct call) to another global is recorded. This information enables building a complete reference graph during the Thin Link phase, and subsequent fast analyses using the global summary information.Current StatusThinLTO is currently supported in both the gold plugin as well as in ld64 starting with . Additionally, support is currently being added to the lld linker. The 3.9 release of clang will have ThinLTO accessible using the -flto=thin command line option.While tuning is still in progress, ThinLTO already performs well compared to LTO, in many cases matching the performance improvement. In a few cases ThinLTO even outperforms full LTO, most likely because the higher scalability of ThinLTO allows using a more aggressive backend optimization pipeline (similar to that of a non-LTO build).The following results were collected for the C/C++ SPEC cpu2006 benchmarks on an 8-core 2.6GHz Intel Xeon E5-2689. Each benchmark was run in isolation three times and results are shown for the average of the three runs.Critically, due to the scalable design of ThinLTO, this performance is achieved with a build time that stays within a non-LTO build scale. The following build times were collected on a 20 core 2.8GHz Intel Xeon CPU E5-2680 v2, running Linux and using the gold linker. The results are for an end-to-end build of clang (ninja clang) from a clean build directory, so it includes all the compile steps and links of intermediate binaries such as llvm-tblgen and clang-tblgen.Release build shows how ThinLTO build time is very comparable to a non-LTO build. Adding -gline-tables-only adds a very small overhead, and ThinLTO is again similar to the regular non-LTO build. However with full debug information, ThinLTO is still somewhat slower than a non-LTO build due to the additional overhead during importing. Ongoing improvements to debug metadata representation and handling are expected to continue to reduce this overhead. In all cases, full LTO is actually significantly slower.On the memory consumption side, the improvements are significant. Over the last two years, FullLTO was significantly improved, as shown on the chart below, but our measurement shows that ThinLTO keeps a large advantage.Usage InformationTo utilize ThinLTO, simply add the -flto=thin option to compile and link. E.g.
% clang -flto=thin -O2 file1.c file2.c -c
% clang -flto=thin -O2 file1.o file2.o -o a.outAs mentioned earlier, by default the linkers will launch the ThinLTO backend threads in parallel, passing the resulting native object files back to the linker for the final native link.
As such, the usage model the same as non- LTO. Similar to regular LTO, for Linux this requires using the
or ld64 starting with .Distributed Build SupportTo take advantage of a distributed build system, the parallel ThinLTO backends can each be launched as a separate process. To support this, the gold plugin provides a thinlto_index_only option that causes the link to exit after creating the combined index and performing global analysis.Additionally, in this mode:Instead of using a monolithic combined index, a separate individual index file is written per backend containing the necessary portions of the combined index for recording the imports and any other global summary based optimization decisions that should be acted on in the backend. A plain text listing of the bitcode files each module will import from is optionally emitted to aid in distributed build file staging (thinlto-emit-imports-files plugin option). The backends can be launched by invoking clang on the bitcode and providing its index via an option. Finally, the resulting native objects are linked to generate the final binary. For example:
% clang -flto=thin -O2 file1.c file2.c -c
% clang -flto=thin -O2 file1.o file2.o -Wl,-plugin-opt,-thinlto-index-only
% clang -O2 -o file1.native.o -x ir file1.o -c -fthinlto-index=./file1.o.thinlto.bc
% clang -O2 -o file2.native.o -x ir file2.o -c -fthinlto-index=./file2.o.thinlto.bc
% clang file1.native.o file2.native.o -o a.outIncremental ThinLTO SupportWith full LTO, only the initial compile steps can be performed incrementally. If any input has changed, the expensive serial IPA/IPO step must be redone.With ThinLTO, the serial Thin Link step must be redone if any input has changed, however, as noted earlier this is small and fast, and does not involve loading any module. And any particular ThinLTO backend must be redone iff:The corresponding (primary) module’s bitcode changed The list of imports into or exports from the module changed The bitcode for any module being imported from has changed Any global analysis result affecting either the primary module or anything it imports has changed. For single machine builds, where the threads are launched by the linker, incremental builds can be achieved by caching the module after applying the global summary based optimizations such as importing, using a hash of the information listed above as the key. This caching is already supported in libLTO’s ThinLTO handling, which is used by ld64. To enable it, the link step needs to be passed an extra flag: -Wl,-cache_path_lto,/path/to/cacheFor distributed builds, the above information in items 2-4 are all serialized into the individual index files. So the build system can compare the contents of the input bitcode files (the primary module’s bitcode and any it imports from) along with the combined index against those from an earlier build to decide if a particular ThinLTO backend must be redone. To make this process more efficient, the content of the bitcode file is hashed when emitted during the compile phase, and the result is stored in the bitcode file itself so that the cache can be queried during the Thin Link step without reading the IR.The chart below illustrates the full build time of clang in three different situations:The full link following a clean build.The developer fixes the implementation of DenseMap::grow(). This is a widely used header in the project, which forces to rebuild a large number of files.The developer fixes the implementation of visitCallInst() in InstCombineCalls.cpp. This an implementation file and incremental build should be fast.These results illustrate how full LTO is not friendly with incremental build, and show how ThinLTO is providing an incremental link-time very close to a non-LTO build.
June 20, 2016
June 16, 2016
The page gives a summary of the significant changes it found today.An example of the kind of view you can get on that page is the followingIn the above screenshot, you can see that there were performance differences on 3 different programs, bigfib, fasta and ffbench. The improvement on ffbench only shows up on a machine named “machine3”, whereas the performance regression on the other 2 programs shows up on multiple machines.The table shows how performance evolved over the past 7 days, one column for each day. The sparkline on the right shows graphically how performance has evolved over those days. When the program was run multiple times to get multiple sample points, these show as separate dots that are vertically aligned (because they happened on the same date). The background color in the sparkline represents a hash of the program binary. If the color is the same on multiple days, the binaries were identical on those days.Let’s look first at the ffbench program. The background color in the sparkline is the same for the last 2 days, so the binary for this program didn’t change in those 2 days. Conclusion: the reported performance variation of -8.23% is caused by noise on the machine, not due to a change in code. The vertically spread out dots also indicate that this program has been noisy consistently over the past 7 days.Let’s now look at the bigfib. The background color in the sparkline has changed since its previous run, so let’s investigate further. By clicking on one of the machine names in the table, we go to a chart showing the long-term evolution of the performance of this program on that machine.Step 2. The long-term performance evolution chartThis view shows how performance has evolved for this program since we started measuring it. When you click on one of the dots, which each represent a single execution of the program, you get a pop-up with information such as revision, date at which this was run etc.When you click on the number after “Run:” in that pop-up, it’ll bring you to the run page.Step 3. The Run pageThe run page gives an overview of a full “Run” on a given machine. Exactly what a Run contains depends a bit on how you organize the data, but typically it consists of many programs being run a few times on 1 machine, representing the quality of the code generated by a specific revision of the compiler on one machine, for one optimization level.This run page shows a lot of information, including performance changes seen since the previous run:When hovering with the mouse over entries, a “Profile” button will show, that when clicked, shows profiles of both the previous run and the current run.Step 4. The Profile pageAt the top, the page gives you an overview of differences of recorded performance events between the current and previous run.After selecting which function you want to compare, this page shows you the annotated assembly:While it’s clear that there are differences between the disassembly, it’s often much easier to understand the differences by reconstructing the control flow graph to get a per-basic-block view of differences. By clicking on the “View:” drop-down box and selecting the assembly language you see, you can get a CFG view. I find showing absolute values rather than relative values helps to understand performance differences better, so I also chose “Absolute numbers” in the drop down box on the far right:There is obviously a single hot basic block, and there are differences in instructions in the 2 versions. The number in the red side-bar shows that the number of cycles spent in this basic block has increased from 431M to 716M. In just a few clicks, I managed to drill down to the key codegen change that caused the performance difference!We combine the above workflow with the llvmbisect tool available at
to also quickly find the commit introducing the performance difference. We find that using both the above LNT workflow and the llvmbisect tool are vital to be able to act quickly on performance deltas.Pointers on setting up your own LNT server for tracking performanceSetting up an LNT server is as simple as running the half a dozen commands documented at
under "Installation" and "Viewing Results". The "Running tests" section is specific to LLVM tests, the rest is generic to performance tracking of general software.The documentation for the json file format to submit results to the LNT server is here: .The documentation for how to also add profile information, is at .
June 13, 2016
June 06, 2016
May 30, 2016
May 23, 2016
May 16, 2016
May 09, 2016
May 02, 2016
April 27, 2016
The LLVM Foundation originally grew out of the need to have a legal entity to plan and support the annual LLVM Developers’ Meeting and LLVM infrastructure. However, as the Foundation was created we saw a need for help in other areas related to the LLVM project, compilers, and tools. The LLVM Foundation has established 3 main programs: Educational Outreach, Grants & Scholarships, and Women in Compilers & Tools.Educational Outreach The LLVM Foundation plans to expand its educational materials and events related to the LLVM Project and compiler technology and tools. First, the LLVM Foundation is excited to announce the 2016 Bay Area LLVM Developers’ Meeting will be held November 3-4 in San Jose, CA. This year will be the 10th anniversary of the developer meeting which brings together developers of LLVM, Clang, and related projects. For this year’s meeting, we are increasing our registration cap to 400 in order to allow more community members to attend.We also are investigating how we can support or be involved in other conferences in the field of compilers and tools. This may include things such as LLVM workshops or tutorials by sponsoring presenters, or providing instructional materials. We plan to work with other conference organizers to determine how the LLVM Foundation can be helpful and develop a plan going forward.However, we want to do more for the community and have brainstormed some ideas for the coming year. We plan to create some instructional videos for those just beginning with LLVM. These will be short 5-10 minute videos that introduce developers to the project and get them started. Documentation is always important, but we find that many are turning to videos as a way to learn. Grants & ScholarshipsWe are creating a grants and scholarships program to cover student presenter travel expenses to the LLVM Developers’ Meetings. However, we also hope to expand this program to include student presenter travel to other conferences where the student is presenting their LLVM related work. Details on this program will be published once they have been finalized. Women in Compilers & ToolsGrace Hopper invented the first compiler and yet women are severely underrepresented in the field of compilers and tools. At the 2015 Bay Area LLVM Developers’ Meeting, we held a BoF on this topic and brainstormed ideas about what can be done. One idea was to increase LLVM awareness at technical conferences that have strong female participation. One such conference is the Grace Hopper Conference (GHC). The LLVM Foundation has submitted a proposal to present about LLVM and how to get involved with the LLVM open source community. We hope our submission is accepted, but if not, we are exploring other ways we can increase our visibility at GHC. Many of the other ideas from this BoF are being considered and actionable plans are in progress.In addition, to these 3 programs, we will continue to support the LLVM Project’s infrastructure. The
server will move to a new machine to increase performance and reliability.
We hope that you are excited about the work the LLVM Foundation will be doing in 2016. Our 2016 Plans & Budget may be viewed . You may also contact our COO & President, Tanya Lattner () or the LLVM Foundation Board of Directors ().
April 25, 2016
April 18, 2016
April 11, 2016
April 04, 2016
April 01, 2016
There’s been
) about compilers abusing undefined behavior. As a response the LLVM compiler infrastructure is rebranding and adopting a motto to make undefined behavior friendlier and less prone to .The re-branding puts to rest a long-standing issue with LLVM’s “dragon” logo
with an , a special form of undefined behavior in its own right. The logo is now clearly a pegasus pony.Another great side-effect of this rebranding is increased security by auto-magically closing all vulnerabilities used by the hacker who goes by the pseudonym “”.These new features are enabled with the -rainbow clang option, in honor of Rainbow Dash’s unary name.A Few ExamplesC++’s memory model specifies that data races are undefined behavior. It is well established that , LLVM will therefore supplement the Standard’s happens-before relationship with an LLVM-specific happens-to-work relationship. On most architectures this will be implemented with micro-pause primitives such as x86’s rep rep rep nop instruction.Shifts by bit-width or larger will now return a normally-distributed random number. This also obsoletes rand() and std::random_shuffle.bool now obeys the rules of
to avoid that annoying “but what if it’s not zero or one?” interview question. Further, incrementing a bool with ++ now does the right thing.Atomic integer arithmetic is already specified to be two’s complement. Regular arithmetic will therefore now also be atomic. Except when volatile, but not when volatile atomic. will now compare equal, subnormals are free to self-classify as normal / zero / other, negative zero simply won’t be a thing, IEEE-754 has been upgraded to PONY-754, floats will still , and generating a signaling NaN is now guaranteed to not be quiet by being equivalent to putchar('\a'). While we’re at it none of math.h will set errno anymore. This has nothing to do with undefined behavior but seriously, errno?Type-punning isn’t a thing anymore. We’re renaming it to type-pony-ing, but it doesn’t do anything surprising besides throw parties. AND WHO DOESN’T LIKE PARTIES? EVEN SECURITY PEOPLE DO! ?A Word From Our SponsorsThe sanitizers—especially ,
and —are great tools when dealing with undefined behavior. Use them on your tests, combine them with , try them as cupcake topping! Be warned: their runtimes aren’t designed to be secure and you shouldn’t ship them in production code!Cutie MarksTo address the
in the room: we’ve left the new LLVM logo’s cutie mark as implementation-defined. Different instances of the logo can use their own cutie mark to illustrate their proclivities, but must clearly document them.Posted by
March 29, 2016
We are excited to announce the next release of the Intel(R) OpenMP* Runtime Library at openmprtl.org. This release aligns with Intel(R) Parallel Studio XE 2016 Composer Edition Update 3.
New Features
OpenMP* 4.5 schedule(simd:static) support
Hwloc topology discovery improved
Spin backoff mechanism fixed in lock code
Plain barrier performance improved on Intel(R) Xeon Phi
Contributions
March 28, 2016
March 21, 2016
March 14, 2016
March 07, 2016
February 29, 2016
February 25, 2016
We are excited to announce the next release of the Intel(R) OpenMP* Runtime Library at openmprtl.org. This release aligns with Intel(R) Parallel Studio XE 2016 Composer Edition Update 2
New Features:
Hwloc* 2.0 support added for affinity interface
OMPT support for windows
Support for untied tasks
OpenMP* 4.5 doacross, taskloop, and new affinity API
Bug Fixes:
February 22, 2016
February 16, 2016
I just finished reading
and wanted to jot down some thoughts.
Fair warning, this post is being written in a hurry.
I’m focusing on getting down my initial reactions rather than trying for a really well thought out post.
That may follow at a later time or it may not.
The first key bit is that the goal of this effort appears to be strictly compile time, not peak performance.
Understanding that makes the effort make a lot more sense.
It’s still seems a bit odd to me for the compile time of your *fourth tier* JIT to be that important, but given I’m no expert in JavaScript, I’ll just accept that as a given.
In that context, I find the claimed 4.7x reduction in compile time surprisingly unexciting.
There’s enough low hanging fruit in LLVM – in particular, a better representation for “check” nodes – that I would expect something on that magnitude being possible within the existing framework they had.
Achieving a ~5 improvement of compile time with an entirely new compiler (and all of the engineering that implies), seems slightly disappointing.
Now it’s possible (heck, even likely!) that the new architecture will allow them to further drop compile time, but still…
The performance numbers quoted were uninformative at best.
From what I can gather in the write up, the JetStream benchmark is highly influenced by compile time.
While I understand the goal (it’s a useful one), it doesn’t really say anything about the peak performance of the code generated by the two compilers.
Given that, it’s really hard to tell if B3 is actually breakeven with the existing LLVM backend at peak. It have been really nice to see the same numbers with the compile time somehow removed or adjusted for.
(b3 with a sleep to be slower?
A longer warmup period in a modified benchmark?)
Just to be clear, I’m not saying that the numbers presented are “wrong”.
Merely that they don’t answer the question I’d most like them to.
Other things that jumped out at me:
The points about matching the IR to the source language in an effort to reduce the number of nodes (and thus memory, and time) are spot on.
If what you’re going for is compile time above all else, using an IR which closely matches your source language is absolutely the right approach.
This same general idea (remove memory/nodes where they don’t provide enough value) is what’s motivating the removal of pointer-to-pointer bitcasts in LLVM right now.
The emphasis on the importance of the “check” node (i.e. early OSR exit if condition fails) matches our experience as well.
You can also see this in Swift’s IR as well.
There is clearly an area that LLVM needs to improve.
I think we can do a lot better within LLVM, and I’m surprised they didn’t try to push that.
In particular, the aliasing problems mentioned could have been addressed with a custom AliasAnalysis instance.
The choice to use arrays (instead of lists) gives some interesting tradeoffs.
From a compile time perspective, a modify and compact scheme is likely a win.
Interestingly, this reminds me a lot of a mark-compact garbage collector (b3’s layout) vs a standard malloc/free allocator (llvm’s layout).
Given typical lifetimes in a compiler (short!), the collector approach is likely to be the right one.
It does raise some interesting challenges though: pointer equality can no longer be used, trivial dead code elimination is no longer trivial (which complicates various other transforms), and transforms have to deal with non-canonical forms due to extra identify nodes.
It’ll be really interesting to see where b3 goes on this basis alone.
February 15, 2016
February 08, 2016
January 25, 2016
January 18, 2016
January 11, 2016
January 04, 2016
I was talking about
proposal and tonight with a friend, and decide to jot down some thoughts in a sharable format.
I believe the idea of a friendly C variant is entirely feasible, but it posses an incredibly challenging design problem.
Every change considered needs to be validated against a deep knowledge of the implementation of the associated compiler, runtime environment, and the underlying hardware.
As a simple example, let’s consider trying to establish semantics for stray (i.e. out of bounds) read and writes.
We can start by trying to define what happens for a stray read.
That’s fairly easy, we can simply return an undefined value.
We could even be a bit more restrictive and say that the value must be one which is written to that address by some part of the program.
(The vagueness in that last bit is to allow concurrent execution reordering.
However, we accidentally required atomic reads and writes since we disallowed wording tearing.
Is that a good thing or not?
There’s a cost to that, but maybe it’s a cost we’re willing to pay.
Or maybe not…)
Now let’s consider how to handle stray writes.
We could simply define them to be erroneous, but that simply gets us back to undefined behavior in C/C++.
We’re trying to avoid that.
We either need to detect them, or provide a reasonable semantics.
Detecting arbitrary stray writes is a very hard problem.
We can easily handle specific categories of stray writes through techniques like , but detecting an arbitrary stray write requires something like a
(or possibly even more expensive checks).
I doubt anyone is willing to pay 2x performance for their C code to be more friendly.
If they were, why are they writing in C?
The challenge with having defined stray writes is what does a particular read return?
Does it return the last written value to a particular address?
Or the last value written to the particular field of the given object?
With out of bounds writes, these are not necessarily the same.
It’s very tempting to have the read return the last value written to the underlying address, but that introduces a huge problem.
In particular, it breaks essentially all load-load forwarding.
int foo(int* p_int, float p_float) {
int a = *p_
*p_float = 0.0;
return a - *p_
In the example above, your normal C compiler could return “0” because it assumes the intervening write can’t change the value at p_int.
An implementation of a friendly C variant with the semantics we’ve proposed could not.
In practice, this is probably unacceptable from a per memory optimization (load-load forwarding and associated optimizations) is a huge part of what a normal C/C++ compiler does.
(see: , . ,
If we want to avoid that problem, we could try to be more subtle in our definition.
Let’s say we instead defined a read as returning either the last value written to that field (i.e. in bounds write) or underlying memory address (i.e. stray write).
We still have the problem of requiring atomic memory access, but we seem to have allowed the compiler optimization we intended.
The problem with this definition is that we’ve introduced a huge amount of complexity to our language specification and compiler.
We now have to have separate definitions of both our objects, their underlying addresses, and all the associated implementation machinery.
Another approach would be to define a read as returning either the last value written to the field (if no stray write has occurred to that address) or an undefined value (if a stray write to that address has occurred).
Is that friendly enough?
Moreover, what happens if we improve our ability to detect stray writes?
Are we allowed to make that write suddenly fail?
Is a program which functions only because of a stray write correct?
(Before you dismiss this as ridiculous, I personally know of an emergency software release that did nothing but reintroduce a particular stray memory write in a C++ program because it happened to restore behavior that a client had been relying on for many many years.)
Hopefully, I’ve given you a hint of the complexities inherent in any friendly C proposal.
These are the same complexities involved in designing any new language.
If anything designing a workable friendly-C proposal is harder than designing a new language.
At least with a new language you’d have the freedom to change other aspects of the language to avoid having to deal with garba in practice, that’s often the much easier approach.
December 28, 2015
December 21, 2015
December 15, 2015
December 07, 2015
November 30, 2015
November 23, 2015
November 16, 2015
November 13, 2015
The bugpoint tool helps to reduce an input IR file while preserving some interesting behavior, usually a compiler crash or a miscompile. Multiple strategies are involved in the reduction of the test case (shuffling instructions, modifying the control flow, etc.), but because it is oblivious to the LLVM passes and the individual backend specificities, "it may appear to do stupid things or miss obvious simplifications", as stated in the . The
gives some insights on the strategies that can be involved by bugpoint, but the details are beyond the scope of this post.Read on to learn how you can use the power of bugpoint to solve some non-obvious problems.Bugpoint Interface Considered HarmfulBugpoint is a powerful tool to reduce your test case, but its interface can lead to frustration (as stated in the documentation: "bugpoint can be a remarkably useful tool, but it sometimes works in non-obvious ways"). One of the main issue seems to be that bugpoint is ironically too advanced! It operates under three modes and switches automatically among them to solve different kind of problem: crash, miscompilation, or code generation (see
for more information on these modes). However it is not always obvious to know beforehand which mode will be activated and which strategy bugpoint is actually using.I found that for most of my uses, I don't want the advanced bugpoint features that deal with pass ordering for example, and I don't need bugpoint to detect which mode to operate and switch automatically. For most of my usage, the `compile-custom` option is perfectly adequate: similar to`git bisect`, it allows you to provide a script to bugpoint. This script is a black box for bugpoint, it needs to accept a single argument (the bitcode file to process) and needs to return 0 if the bitcode does not exhibit the behavior you're interested in, or a non zero value in the other case. Bugpoint will apply multiple strategies in order to reduce the test case, and will call your custom script after each transformation to validate if the behavior you're looking for is still exhibited. The invocation for bugpoint is the following:$ ./bin/bugpoint -compile-custom -compile-command=./check.sh -opt-command=./bin/opt my_test_case.llThe important part is the two options -compile-custom and -compile-command=path_to_script.sh that indicate to bugpoint that it should use your own script to process the file. The other important part is the -opt-command option that should point to the correct opt that will be used to reduce the test case. Indeed by default bugpoint will search in the path for opt and may use an old system one that won't be able to process your IR properly, leading to some curious error message:*** Debugging code generator crash!Checking for crash with only these blocks:
diamond .preheader .lr.ph .end: error: Invalid type for valuesimplifycfg failed!Considering such a script `check.sh`, running it with your original test case this way:$ ./check.sh my_test_case.ll && echo "NON-INTERESTING" || echo "INTERESTING"should display INTERESTING before you try to use it with bugpoint, or you may very well be surprised. In fact bugpoint considers the script as a compile command. If you start with an NON-INTERESTING test case and feed it to bugpoint, it will assume that the code compiles correctly, and will try to assemble it, link it, and execute it to get a reference result. This is where bugpoint behavior can be confusing when it automatically switches mode, leaving the user with a confusing trace. A correct invocation should lead to a trace such as:./bin/bugpoint
-compile-custom
-compile-command=./check.sh
-opt-command=./bin/opt slp.ll Read input file
: 'slp.ll'*** All input okInitializing execution environment: Found command in: ./check.shRunning the code generator to test for a crash: Error running tool:
./check.sh bugpoint-test-program-1aa0e1d.bc*** Debugging code generator crash!Checking to see if we can delete global inits: &crash&*** Able to remove all global initializers!Checking for crash with only these blocks:
.lr.ph6.preheader .preheader .lr.ph.preheader .lr.ph .backedge
._crit_edge.loopexit... &11 total&: &crash&Checking for crash with only these blocks: .preheader .backedge .lr.ph6.preheader: Checking for crash with only these blocks: .lr.ph ._crit_edge: ......Checking instruction:
store i8 %16, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1, !tbaa !2*** Attempting to perform final cleanups: &crash&Emitted bitcode to 'bugpoint-reduced-simplified.bc'In practice the ability to write a custom script is very powerful, I will go over a few use cases I recently used bugpoint with.Search For a String in the OutputI recently submitted a patch (http://reviews.llvm.org/D14364) for a case where the loop vectorizer didn't kick-in on a quite simple test case. After fixing the underlying issue I needed to submit a test with my patch. The original IR was a few hundred lines. Since I believe it is good practice to reduce test cases as much as possible, bugpoint is often my best friend. In this case the analysis result indicates "Memory dependences are safe with run-time checks" on the output after my patch.Having compiled `opt` with and without my patch and copied each version in `/tmp/` I wrote this shell script:#!/bin/bash/tmp/opt.original -loop-accesses -analyze $1 | grep "Memory dependences are safe"res_original=$?/tmp/opt.patched -loop-accesses -analyze $1 | grep "Memory dependences are safe"res_patched=$?[[ $res_original == 1 && $res_patched == 0 ]] && exit 1exit 0 It first runs the bitcode supplied as argument to the script (the $1 above) through opt and uses grep to check for the presence of the expected string in the output. When grep exits, $? contains with 1 if the string is not present in the output. The reduced test case is valid if the original opt didn't produce the expected analysis but the new opt did.Reduce While a Transformation Makes EffectsIn another case (http://reviews.llvm.org/D13996), I patched the SLP vectorizer and I wanted to reduce the test case so that it didn't vectorize before my changes but vectorizes after:#!/bin/bashset -e/tmp/opt.original -slp-vectorizer -S & /tmp/original.ll $1/tmp/opt.patched -slp-vectorizer -S & /tmp/patched.ll $1diff /tmp/original.ll /tmp/patched.ll && exit 0exit 1The use of a custom script offers flexibility and allows to run any complex logic to decide if a reduction is valid or not. I used it in the past to reduce crashes on a specific assertion and avoiding the reduction leading to a different crash, or to reduce for tracking instruction count regressions or any other metric.Just Use FileCheckLLVM comes with a
(FileCheck) that the tests are using intensively. You can annotate your original test case and write a script that reduce it for your patch. Let's take an example from the public LLVM repository with commit r252051 "[SimplifyCFG] Merge conditional stores". The associated test in the validation is test/Transforms/SimplifyCFG/merge-cond-stores. and it already contains all the check we need, let's try to reduce it. For this purpose you'll need to process one function at a time, or bugpoint may not produce what you expect: because the check will fail for one function, bugpoint can do any transformation to another function and the test would still be considered "interesting". Let's extract the function test_diamond_simple from the original file:$ ./bin/llvm-extract -func=test_diamond_simple test/Transforms/SimplifyCFG/merge-cond-stores.ll -S & /tmp/my_test_case.llThen checkout and compile opt for revision r252050 and r252051, and copy them in /tmp/opt.r252050 and /tmp/opt.r252051. The check.sh script is then based on the CHECK line in the original test case:#!/bin/bash# Process the test before the patch and check with FileCheck,# this is expected to fail./tmp/opt.r252050 -simplifycfg -instcombine -phi-node-folding-threshold=2 -S & $1 | ./bin/FileCheck merge-cons-stores.lloriginal=$?# Process the test after the patch and check with FileCheck,# this is expected to succeed./tmp/opt.r252051 -simplifycfg -instcombine -phi-node-folding-threshold=2 -S & $1 | ./bin/FileCheck merge-cons-stores.llpatched=$?# The test is interesting if FileCheck failed before and# succeed after the patch.[[ $original != 0 && $patched == 0 ]] && exit 1exit 0I intentionally selected a very well written test to show you both the power of bugpoint and its limitation. If you look at the function we just extracted in my_test_case.ll for instance:; CHECK-LABEL: @test_diamond_simple; This should get if-converted.; CHECK: store; CHECK-NOT: store; CHECK: retdefine i32 @test_diamond_simple(i32* %p, i32* %q, i32 %a, i32 %b) {entry:
%x1 = icmp eq i32 %a, 0
br i1 %x1, label %no1, label %yes1yes1:
store i32 0, i32* %p
br label %fallthroughno1:
%z1 = add i32 %a, %b
br label %fallthroughfallthrough:
%z2 = phi i32 [ %z1, %no1 ], [ 0, %yes1 ]
%x2 = icmp eq i32 %b, 0
br i1 %x2, label %no2, label %yes2yes2:
store i32 1, i32* %p
br label %endno2:
%z3 = sub i32 %z2, %b
br label %endend:
%z4 = phi i32 [ %z3, %no2 ], [ 3, %yes2 ]
ret i32 %z4}The transformation introduced in this patch allows to merge the stores in the true branches yes1 and yes2:declare void @f()define i32 @test_diamond_simple(i32* %p, i32* %q, i32 %a, i32 %b) {entry:
%x1 = icmp eq i32 %a, 0
%z1 = add i32 %a, %b
%z2 = select i1 %x1, i32 %z1, i32 0
%x2 = icmp eq i32 %b, 0
%z3 = sub i32 %z2, %b
%z4 = select i1 %x2, i32 %z3, i32 3
%0 = or i32 %a, %b
%1 = icmp eq i32 %0, 0
br i1 %1, label %3, label %2; &label&:2 ; preds = %entry
%simplifycfg.merge = select i1 %x2, i32 %z2, i32 1
store i32 %simplifycfg.merge, i32* %p, align 4
br label %3; &label&:3 ; preds = %entry, %2
ret i32 %z4}The original code seems pretty minimal, the variable and block names are explicit, it is easy to follow and you probably wouldn't think about reducing it. For the exercise, let's have a look at what bugpoint can do for us here:define void @test_diamond_simple(i32* %p, i32 %b) {entry:
br i1 undef, label %fallthrough, label %yes1yes1:
; preds = %entry
store i32 0, i32* %p
br label %fallthroughfallthrough:
; preds = %yes1, %entry
%x2 = icmp eq i32 %b, 0
br i1 %x2, label %end, label %yes2yes2:
; preds = %fallthrough
store i32 1, i32* %p
br label %endyes2:
; preds = %yes2, %fallthrough
ret void}Bugpoint figured out that the no branches were useless for this test and removed them. The drawback is that bugpoint also has a tendency to introduce undef or unreachable here and there, which can make the test more fragile and harder to understand.
Not There Yet: Manual CleanupAt the end of the reduction, the test is small but probably not ready to be submitted with your patch "as is". Some cleanup is probably still needed: for instance bugpoint won't convert invoke into calls,
remove metadata, tbaa informations, personality function, etc. We also saw before that bugpoint can modify your test in unexpected way, adding undef or unreachable. Also you probably want to rename the variables to end up with a readable test case.Fortunately, having the check.sh script at hand is helpful in this process, since you can just manually modify your test and run continuously the same command:$ ./check.sh my_test_case.ll && echo "NON-INTERESTING" || echo "INTERESTING"While the result is
INTERESTING you know you keep having a valid test and you can continue to proceed with your cleanup.Keep in mind that bugpoint can do far more, but hopefully this subset will be helpful to the ones that are still struggling with its command line options.Finally, I'm grateful to Manman Ren for her review of this post.
November 09, 2015
November 02, 2015
October 26, 2015
October 20, 2015
October 19, 2015
October 15, 2015
We are excited to announce the next release of the Intel(R) OpenMP* Runtime Library at openmprtl.org. This release aligns with Intel(R) Parallel Studio XE 2016 Composer Edition Update 1.
New Features
Added dynamic/hinted lock implementation that supports OpenMP* locks with hints
Disabled monitor thread when KMP_BLOCKTIME=infinite
Improved stack protection with safe C library for string/memory operations
October 05, 2015
September 28, 2015
September 21, 2015
September 14, 2015
September 07, 2015
September 01, 2015
August 31, 2015}

我要回帖

更多关于 ac手表 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信