More Fields

Filters

Profiles which have:

Recent Changes

Profiles with recent changes to:
Sign up to Download

1-13 of 13 results

  • www.alcadvisors.com
  • 2
  • 1
  • 14
ALC Advisors is a boutique advisory and investment firm specializing in China and providing the highest quality senior level attention to all our clients, while avoiding the conflict of interest problem inherent in large firms... ALC's professionals..

Relevance: 15.268833
  • hanlab.mit.edu
  • 1
  • 20
We introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache, and implement QServe inference library that improves the maximum achievable serving throughput of Llama-3-8B by 1.2× on A100, 1.4× on L40S;..

Relevance: 10.638607
  • hanlab.mit.edu
  • 22
Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only..

Relevance: 10.01945
  • hanlab.mit.edu
  • 22
We introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache, and implement QServe inference library that improves the maximum achievable serving throughput of Llama-3-8B by 1.2× on A100, 1.4× on L40S;..

Relevance: 9.98567
  • hanlab.mit.edu
  • 22
Tiny machine learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However,..

Relevance: 9.98567
  • hanlab.mit.edu
  • 22
We introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache, and implement QServe inference library that improves the maximum achievable serving throughput of Llama-3-8B by 1.2× on A100, 1.4× on L40S;..

Relevance: 9.98567
  • hanlab.mit.edu
  • 22
The attention mechanism is becoming increasingly popular in Natural Language Processing (NLP) applications, showing superior performance than convolutional and recurrent architectures. However, general-purpose platforms such as CPUs and GPUs are..

Relevance: 9.586961
  • hanlab.mit.edu
  • 22
We address the challenging problem of efficient inference across many devices and resource constraints, especially on edge devices. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural..

Relevance: 9.586961
  • hanlab.mit.edu
  • 1
  • 22
Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety...

Relevance: 9.586961
  • texasmakes.tamu.edu
  • 2
  • 1
  • 61
The institute has been successful in securing sponsored research projects and grants since its inception, and in recent years, it has played a central role in the coordination of various manufacturing-focused initiatives for Texas A&M Engineering...

Relevance: 4.1538305
  • muidsi.missouri.edu
  • 101
  • 1
  • 71
To have a global impact in advancing computational research in biology and medicine, the stakeholders across the University of Missouri System believe it is critical to the mission of the university and the development of the community to offer a..

Relevance: 3.7736354
  • www.iaiad.com
  • 2
  • 1
  • 226
Over the past 24 years, IAI AWARDS (once called International Advertisting Awards ) has been a well-known non-profit Chinese advertising and marketing award organization. In 2016, IAI was redefined as an international idea creativity awards that..

Relevance: 2.1118455
  • www.rle.mit.edu
  • 715
  • 48
  • 629
The Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT) was the first of the Institute's great modern interdepartmental academic research centers. Today, we are one of MIT's largest such organizations, and the..

Relevance: 0.78586763