Nowadays, STL is an essential component in almost every CPP project. That is why several questions about STL are asked in a technical interview. Especially, the std::vector is a popular subject. In this post, we gonna check the codes related to std::vector‘s growth, which is a hot topic in STL. The term “growth” in std::vector means an event to increase a size of instance by some actions. The action would be inserting an element (e.g. push_back()) or tuning its size (e.g. resize()). Some of us say “When the growth happens, its size become twice.”, but some of others say “No, it is exactly 3/2 times.”. Well…both of saying are not wrong. Let us find out why it is.
As you can see, there is a branch on returning the function. When _Mylast != _My_data._Myend is true, the growth not happens. Because the logic in the _Emplace_back_with_unused_capacity() does not reallocate memory, but reuse unused memory. FYI, values about _Mypair have the relationship like below:
1
_Compressed_pair<_Alty, _Scary_val> _Mypair;
1 2 3 4 5
// https://github.com/microsoft/STL/blob/c12089e489c7b6a3896f5043ed545ac8d1870590/stl/inc/xmemory template <class_Ty1, class_Ty2, bool = is_empty_v<_Ty1> && !is_final_v<_Ty1>> class _Compressed_pair final : private _Ty1 { // store a pair of values, deriving from empty first public: _Ty2 _Myval2;
1 2 3 4 5 6 7 8 9 10 11 12
// CLASS TEMPLATE vector template <class_Ty, class_Alloc = allocator<_Ty>> class vector { // varying size array of values private: template <class> friendclass _Vb_val; friend _Tidy_guard<vector>;
using _Alty = _Rebind_alloc_t<_Alloc, _Ty>; ... using _Scary_val = _Vector_val<conditional_t<_Is_simple_alloc_v<_Alty>, _Simple_types<_Ty>, _Vec_iter_types<_Ty, size_type, difference_type, pointer, const_pointer, _Ty&, const _Ty&>>>;
pointer _Myfirst; // pointer to beginning of array pointer _Mylast; // pointer to current end of sequence pointer _Myend; // pointer to end of array };
Since the elements are placed in sequential memory address, _Mylast - _Myfirst means “currently used size”.
As a result, _Mylast != _My_data._Myend is true when _Mylast < _Myend is true. That is why reallocation not happens. Get back to emplace_back() code. According to those upper reasons, now we need to focus on _Emplace_reallocated() function.
As you can see the codes, deallocation and reallocation happen. The variable _Newcapacity determines the size of memory will be reallocated. Let us check the function _Calculate_growth().
if (_Geometric < _Newsize) { return _Newsize; // geometric growth would be insufficient }
return _Geometric; // geometric growth is sufficient }
There are three return statement in the function.
First, when the current available size is bigger than 2/3 times of maximum size.
For instance, a maximum value of int type is +2,147,483,647 and 2/3 times of value is +1,431,655,764.666... ≒ +1,431,655,765. Let us put them in the expression. if (1431655765 > 2147483647 - 1431655765 / 2) will be false, but how about if _Oldcapacity = +1,431,655,766 ? if (1431655766 > 2147483647 - 1431655766 / 2) will be true. In this case, new size will be forced as the maximum size.
Second, when the current available size is less than 2.
For instance, when the _Oldcapacity is in {0, 1} the expression const size_type _Geometric = _Oldcapacity + _Oldcapacity / 2; will be the same with _Oldcapacity. In this case, new size will be forced as _Newsize, which is passed by _Oldsize + 1 in _Emplace_reallocate().
_Oldcapacity
Calculation
0
0 + 0 / 2 = 0
1
1 + 1 / 2 = 1
Third, other cases of the current available size.
The _Geometric will have 3/2 times of _Oldcapacity. That is why the 3/2 times of growth happens in MSVC. And now you understand why new size has to be set by maximum value when the _Oldcapacity is bigger than 2/3 times of maximum size.
The resize() has a similar flow. Let us find out.
1 2 3 4
_CONSTEXPR20_CONTAINER voidresize(_CRT_GUARDOVERFLOW const size_type _Newsize){ // trim or append value-initialized elements, provide strong guarantee _Resize(_Newsize, _Value_init_tag{}); }
// if _Newsize == _Oldsize, do nothing; avoid invalidating iterators }
The resize() can trim or append available memory. Trimming happens when you call resize() with smaller value than current available size. Appending happens when you call resize() with greater value than current available size. We go to _Resize_reallocate().
// [23.2.4.3] modifiers /** * @brief Add data to the end of the %vector. * @param __x Data to be added. * * This is a typical stack operation. The function creates an * element at the end of the %vector and assigns the given data * to it. Due to the nature of a %vector this operation can be * done in constant time if the %vector has preallocated space * available. */ void push_back(const value_type& __x) { if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) { _GLIBCXX_ASAN_ANNOTATE_GROW(1); _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x); ++this->_M_impl._M_finish; _GLIBCXX_ASAN_ANNOTATE_GREW(1); } else _M_realloc_insert(end(), __x); }
Here are two cases. First, when an available memory exists. Second, otherwise.
void _M_swap_data(_Vector_impl_data& __x) _GLIBCXX_NOEXCEPT { // Do not use std::swap(_M_start, __x._M_start), etc as it loses // information used by TBAA. _Vector_impl_data __tmp; __tmp._M_copy_data(*this); _M_copy_data(__x); __x._M_copy_data(__tmp); } }; ... _Vector_impl _M_impl;
std::vector in GCC has internal indicators like std::vector in MSVC. So we should focus on _M_realloc_insert() function.
#if __cplusplus >= 201103L template<typename _Tp, typename _Alloc> template<typename... _Args> void vector<_Tp, _Alloc>:: _M_realloc_insert(iterator __position, _Args&&... __args) #else template<typename _Tp, typename _Alloc> void vector<_Tp, _Alloc>:: _M_realloc_insert(iterator __position, const _Tp& __x) #endif { const size_type __len = _M_check_len(size_type(1), "vector::_M_realloc_insert"); pointer __old_start = this->_M_impl._M_start; pointer __old_finish = this->_M_impl._M_finish; const size_type __elems_before = __position - begin(); pointer __new_start(this->_M_allocate(__len)); pointer __new_finish(__new_start); __try { // The order of the three operations is dictated by the C++11 // case, where the moves could alter a new element belonging // to the existing vector. This is an issue only for callers // taking the element by lvalue ref (see last bullet of C++11 // [res.on.arguments]). _Alloc_traits::construct(this->_M_impl, __new_start + __elems_before, #if __cplusplus >= 201103L std::forward<_Args>(__args)...); #else __x); #endif __new_finish = pointer();
Hoo, it is too long. We do not have to look into whole code, but the variable __len. The variable is used for reallocation. And it is set by _M_check_len().
1 2 3 4 5 6 7 8 9 10
// Called by _M_fill_insert, _M_insert_aux etc. size_type _M_check_len(size_type __n, constchar* __s) const { if (max_size() - size() < __n) __throw_length_error(__N(__s));
The code throw an error when current size is the same with maximum size because the function was called as _M_check_len(size_type(1), ...). Otherwise, new size will be set by 2 times of current size. Except for when current size is 0.
Current size
Calculation
0
1 = 0 + max(0, 1)
1
2 = 1 + max(1, 1)
2
4 = 2 + max(2, 1)
And, returns maximum size when underflow or overflow happens. Otherwise, returns new size calculated as 2 times of current size.
Next, check the resize() in GCC.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
/** * @brief Resizes the %vector to the specified number of elements. * @param __new_size Number of elements the %vector should contain. * * This function will %resize the %vector to the specified * number of elements. If the number is smaller than the * %vector's current size the %vector is truncated, otherwise * default constructed elements are appended. */ void resize(size_type __new_size) { if (__new_size > size()) _M_default_append(__new_size - size()); elseif (__new_size < size()) _M_erase_at_end(this->_M_impl._M_start + __new_size); }
We can see the resize() in GCC also do trimming and appending. (Interestingly, nothing happens when __new_size is equal to current size.) So, we should focus on _M_default_append() function.
It is long one, too. What is __navail ? It seems meaning of Number of AVAILable memory, not the Not AVAILable memory. So, we can see the memory is reused when if (__navail >= __n) is true. Otherwise, reallocation happens. Oh, Hi. We meet _M_check_len() again. Then, new size will be 2 times of current size.
Wrap-up
Common
Try to recycle memory as possible as can. (e.g. reuse available memory in push_back() logic.)
Care about underflow and overflow.
Have internal indicators for {First, Current, End}
Currently allocated size = End - First
Currently used size = Current - First
Currently available size = End - Current
MSVC
Growth happens with 3/2 times of amount. (in normal case)
GCC
Growth happens with 2 times of amount. (in normal case)